dessant / web-archives

Browser extension for viewing archived and cached versions of web pages, available for Chrome, Edge and Safari
https://apps.apple.com/us/app/web-archives-for-safari/id1603181853
GNU General Public License v3.0
1.18k stars 89 forks source link

Optional (But Default On) Tracker URL Removal So That Searches Don't as Easily Fail #76

Open mollyrealized opened 1 year ago

mollyrealized commented 1 year ago

Is your feature request related to a problem? Please describe.

It was a judgment call whether to file this as a bug or a feature request, but I think it's much the latter. When a site has Google Analytics tracking as part of its URL (the ubiquitous "utm_source" and "utm_medium" and "utm_campaign", it will usually fail when it is piped over to archive.today (archive.is, archive.ph, etc.) -- and with some other trackers, as well, I suspect (but don't know for sure).

Describe the solution you'd like

It would be useful if by default Web Archives stripped trackers out of the URL being looked up on archives, with perhaps a setting to disable that entirely, or disable it per lookup.

Describe alternatives you've considered

I presently hand-remove the trackers and rerun the request. It works; it's just an annoyance. :) I've also brought it up with archive.today, but it seems to have fallen into their bit bucket of "maybe someday".

dessant commented 7 months ago

We could use an existing filter list, but I'd like to avoid writing our own filter list parser, and from a quick search I couldn't find a compact js package for parsing static filter lists that has a permissive license.

https://github.com/DandelionSprout/adfilt/blob/master/ClearURLs%20for%20uBo/clear_urls_uboified.txt

mollyrealized commented 7 months ago

I definitely acknowledge an existing filter list being useful, but even as a first step, removing Google tracking would go a long way. Perhaps something like

[?&]utm_[^&]+