gab-ai-inc / gab-dissenter-extension

Dissenter.com Browser Extension source code
https://dissenter.com
Apache License 2.0
272 stars 43 forks source link

A tailing forward slash alone will cause dissenter to see a different page #33

Open Sharpiro opened 5 years ago

Sharpiro commented 5 years ago

Perhaps this has been mentioned but I didn't see anything after glancing over the issues. I confirmed this is still occurring with the latest updates when commenting on fresh un-commented sites.

Guess what site I'll be using as an example? lol

Example: 300 comments: https://www.rottentomatoes.com/m/captain_marvel/ 3k+ comments: https://www.rottentomatoes.com/m/captain_marvel

Browser: Firefox Extension Version: 0.1.1

markadrake commented 5 years ago

What's strange about this issue is if you visit the URL with the forward slash, you are redirected to the page without it.

First question we should try to answer is: how did anyone ever manage to comment on the URL ending with the slash?

Also, the page itself has a canonical meta tag, again referencing the page without the forward slash.

<link rel="canonical" href="https://www.rottentomatoes.com/m/captain_marvel">

I'd recommend that if Dissenter detects a canonical URL for the page being commented on, that it respect it and anywhere the URL is shown for the page reference the canonical URL over any URL the user may have actually been on while commenting.

I don't pretend to know or have even explored what the data you store looks like, but It may be easier to update the schema so comments are tied to multiple URLs:

{
    guid: "00000000-0000-0000-0000-000000000000"
    urls: [
        "https://www.rottentomatoes.com/m/captain_marvel/",
        "https://www.rottentomatoes.com/m/captain_marvel"
    ],
    canonicalUrl: "https://www.rottentomatoes.com/m/captain_marvel",
    comments: [
        ...
    ]
}
Sharpiro commented 5 years ago

What's strange about this issue is if you visit the URL with the forward slash, you are redirected to the page without it.

I tested on Firefox and Chrome on Fedora, and I did not experience this redirect, so I was able to view both comment threads with no issue.

I'd recommend that if Dissenter detects a canonical URL for the page being commented on, that it respect it and anywhere the URL is shown for the page reference the canonical URL over any URL the user may have actually been on while commenting.

Wouldn't this require trusting the website? It might be better to just trim any tailing slashes. I heard dissenter had to make some fixes for websites that were creating some kind of guid per page served to make it harder for tools like dissenter to function. I'm not sure if this would add complexity to the issue.

markadrake commented 5 years ago

I was redirected on my work network but this evening at home I'm not. Not sure what might have been the cause for that.

The canonical URL is a pretty safe bet as its used to reveal to search engines that it's the true URL for the content, and not to penalize them for duplicate content. Otherwise, you have these 2 URLs competing with each other and seen by Google (and other engines) as duplicate content.

That being said, I'd recommend for this very reason you stated to record multiple URLs. This way both of these pages would show 3,300 comments – instead of 3,000 and 300 separately.

johnmarkkarr commented 5 years ago

They need to look for the canonical url and they need to normalize urls and they need to strip off anchors, if they don't do that yet. Might even want to add custom logic for certain websites to filter query strings, such as Google search results which appends browser information.

mgabdev commented 5 years ago

Hi all, and thanks @Sharpiro for opening this issue. This is something we're aware of and we are circulating ideas internally on the best way to squash this along with other URL parsing related issues.

As far as canonical URLs, they are more sturdy but, developers are often wrong and their canonical fields are often corrupt. This is why we really try to just use the given URL.

Though, we do understand that this is in fact an issue that keeps cropping up and will be working to resolve!