Can redirecting/cleaning ad-blocked requests be avoided?

Rtizer-9 commented 4 years ago

Hey @Cimbali I was thinking about it for a long time; is it possible to make cleanlinks avoid wasting time in cleaning ads and tracking domains which are already blocked and better dealt with by the already installed dedicated adblockers?

You can consider these as my doubts instead of feature requests if they doesn't apply (maybe I'm misunderstanding the working of cleanlinks)

How does cleanlinks actually deal with requests which are also being blocked by ublock origin, ghostery etc.?
If the blocked requests which we see in log are indicator of them being already blocked then wouldn't it be better to just not include them in the log and avoid wasting time in intercepting and redirecting. Log always contain blocked ads requests which ALSO has a redirected results shown, so what's actually happening here?
Do you think introducing a mechanism in cleanlinks would be good through which it can avoid intercepting such completely unnecessary ads/tracking/cookie... requests? Maybe it can be done by first checking the requests against a blacklist which only has hosts and NOT cosmetic filters. This way we can:

keep cleanlinks and adblockers (request blockers) working completely separate.
keep the cleanlinks log more clean coz now its not intercepting various requests and most probably not even wasting time in redirecting those already OR to-be-blocked requests. On github itself, cleanlinks always has numerous collector.githubapp entries even though they are always blocked by ublock and need not be redirected. Since, cleanlinks intercepts and redirects every request and not just cleans parameters, it also is prone to more entries in log. (I saw that parent-child url issue and think it can also benefit a lot if the log is devoid of as much entries as possible.)
adblockers also tend to use several scriptlets to work around anti-adblocking. I've not found a single case or I may have solved it another way without noticing, where cleanlink intercepts such requests and interferes with the adblockers working so I'm not saying that it's actually happening but don't you think it would be just better not to intercept those requests? Earlier, I had also asked the developer of clearurls to not block ad domains and then he introduced the option to avoid blocking doubleclick domain.
I went to several pages for eg: https://www.theguardian.com/world/live/2020/apr/01/coronavirus-live-news-us-deaths-could-reach-240000-un-secretary-general-crisis-worst-since-second-world-war-us-uk-europe-latest-updates and tested cleanlinks functionality both with and without adblockers and it indeed was intercepting as well as blocking various ads requests so what's actually going on here? How does cleanlinks "know" its and ad request and needs to be blocked?

Cimbali commented 4 years ago

The only requests CleanLinks blocks full fill 2 conditions:

the are to domains other than your current page, and
they contain your current URL

CleanLinks is definitely not meant to be an ad-blocker, but it basically detects “for free” which requests leak your current address. Since it doesn’t make sense to re-load your current page as a script or an iframe inside the same page, it just cancels those requests.

I think the main issue here is the way in which Firefox applies the add-ons that either cancel or redirect each request. From the onBeforeRequest documentation:

When multiple blocking handlers modify a request, only one set of modifications take effect. Redirects and cancellations have the same precedence. So if you canceled a request, you might see another request with the same requestId again if another blocking handler redirected the request.

What I understand from that is that all requests are passed to all the add-ons. If there was a way to put CleanLinks only after an ad-blocker to reduce the number of requests it gets, I would be more than happy to do that.

There is also potential for clashes as you fear. If we redirect a request while an adblocker blocks it, it might get redirected instead. Not sure how much damage that can actually do, to be honest, probably not a lot?

The only thing I can think of off the top of my head − but I have no idea if it’s feasible − would to to query adblockers when we clean, and if they block that request, then do nothing on the CleanLinks side.

I must say that with uMatrix, a lot of the scripts that would go on to create a lot of noise for CleanLinks are blocked right away, so that’s a strict but rather workable setup.

Cimbali commented 4 years ago

Also @Rtizer-9 I’d be glad to have your input on the mock-ups in #104 for (an attempt at) a more readable interface.

Rtizer-9 commented 4 years ago

query adblockers when we clean

I was exactly thinking about this but was not sure if it's even possible or not.

use uMatrix

Even though it's one of the best thing that has happened in the privacy community, you know that it's not something that everybody uses and given the huge amount of websites I visit, its really hard to configure uMatrix for each and every case. Many websites are badly developed and you need to turn various extensions off for them to work and obviously every cleanlinks user won't be using it.

Adblocker querying

I think that in this approach there would still be various cases where we will get various domains not intercepted by adblockers but still completely unnecessary to redirect (think about ad domains in filterlist which are whitelisted for specific websites coz blocking them breaks the website). Those unnecessary domains will still be intercepted and will clutter the cleanlinks log.

A great example of what I've in mind is already implemented in https://addons.mozilla.org/en-US/firefox/addon/skip-redirect/ and https://addons.mozilla.org/en-US/firefox/addon/remove-redirect

Skip redirect is one of the most prominent redirector and as you can see they have a separate blacklist and whitelist which consist of regex and can also whitelist on url basis (I think I read about this issue as well here).

If you look at their blacklist which it avoids intercepting /abp /account /adfs /auth /cookie /download /login /logoff /logon /logout /oauth /preferences /profile /register /saml /signin /signoff /signon /signout /signup /sso /subscribe /verification

you can clearly see if you implement something along this line in the cleanlinks as well then the number of issues created here will come down to 60 or 70% maybe. Majority of the users whining about turning cleanlinks off again and again wouldn't have complained about it in the first place if such an implementation was there already which will avoid breaking many websites. If I'm right it can also lessen down the number of rules needed to be written for specific domains for logins/signup etc to work.

Well with that being said, I would also like to inform you that as I can recall from my experience of using that extension, I've also needed to turn it off for few cases but it can be solved easily by whitelisting or turning it off (off option is the topmost one).

Now just to make myself more clear, what I'm expecting from cleanlinks is:

just simply stop intercepting ANY kind of unwanted request which need not be intercepted whether ad or not. Its just not a good thing for a redirector to intercept such unwanted links coz it leads to various breakage as it can be seen from the number of issues created here.
as a result of above, it can also help in keeping log of cleanlinks more clean and tidy and also stop wasting resources in redirecting them.

Cimbali commented 4 years ago

In theory it’s possible to query adblockers. In practice I didn’t see any onMessageExternal or onConnectExternal calls in uBlock and uMatrix, so I don’t think they provide any API to answer requests right now.

I think we already look at oauth, saml and sso URLs, but we can expand that list. Most of the others you propose seem to make sense to me, though I’m not sure what /abp and /adfs are used for.

I think the CleanLinks way would be to redirect anything that’s not whitelisted though. Temporary breakage is not that bad when you can edit your own whitelist.

So to sum it up (correct me if I’m wrong):

[x] short term, improve general whitelist with any-domain patterns such as /signin etc.
[ ] long term, ask ad-blockers whether we need to bother with some requests or whether they’ll be blocked anyway.

Rtizer-9 commented 4 years ago

I'm more in favor of improving the general whitelist with more and more patterns which will benefit everyone in long turn by decreasing breakages.

Querying ad-blockers, if can be done easily and won't lead to problems in the long turn, then yeah why not?

I think improving whitelist, to the point requests just don't need to be queried with ad-blockers etc is better than going down the road of introducing more and more complex functionality into the extension and making it a complete Christmas tree. Other similar extensions don't deal with requests in such amount and that's one of the reason for their lesser breakage. Those whitelists in skip redirect has proven to work so that's a great approach really. I mean stopping login, signup requests redirection is the most logical thing to do.

Cleanlinks is on the way of becoming the most advanced redirector and one of the must have extension and therefore I think its more necessary that we focus on the features which benefit most people (less breakage) rather than listening to idiots who think the developer "owe" it to them (that "users just shouldn't need to disable add-on...its your duty...stop developing this add-on" thread was one of the most stupid thing I've ever read on GitHub). Adding or removing new functionality on a per issue basis is not a feasible thing to do as it hinders the "good" development and kind of only caters to useless requests which are only specific to a single user and can easily be taken care of by them for their "special" use cases.

I don't know if querying ad-blockers is really the way to go coz what if user have another similar extensions installed which "intercepts" adblockers and cleanlinks' communication. Ad-blockers blocks requests in very tricky ways and thus it's simply better not to touch those requests. On the other hand if there's way cleanlinks can coordinate with these kind of extensions which results in lesser breakage, less interference between addons, less resource consumption then yeah it will be great.

Just do what you think is better for the extension in the long run :)

PS: my google-fu told me that /abp and /adfs are some kind of authentication methods. Just search adfs authentication.

Rtizer-9 commented 4 years ago

I'm unable to come up with any constructive feedback for #104 at the moment, at least right now and I think those new hierarchical implementation you suggested looks great. Uh, maybe there can be very small icon (just like those cross symbol) in side which allows copying or clicking of the requests...idk.

Cimbali commented 4 years ago

Let’s leave this open, those are good suggestions. I’ll fix the short-term one ASAP and leave the other one as more of a wishlist thing.

On communicating with other add-ons:

either we use very simple message, e.g. send a messag that just contains the requestId and get back a boolean (whether the ad-blocker recommends blocking or not),
or we make it a communication channel (onConnectExternal) In both cases we need to specify the id of the add-on we’re talking to, so I suppose messages can’t be intercepted to fool CleanLinks into not cleaning some links. That’s worth checking.

Cimbali commented 4 years ago

On the short-term whitelist additions, I think /authorize could be added as well. I think all parameters should be whitelisted wholesale.
On the “other requests” I didn’t mention that obviously users dan disable “Clean all outgoing HTTP Requests, instead of only top frame” to only have clicked links and redirects be cleaned.

Rtizer-9 commented 4 years ago

I think instead of giving users just two options of either being able to select all requests or just the top frame, the option can be more configurable like for eg that of ClearUrls.

The benefit here is user will be able to use request handling as per his use case and also a drawback with that top frame is that there are various redirections which happens with other requests like fonts etc which really needs a redirector extension. That was the main reason defaults are now changed from only top frame to others as well in ClearUrls.

Although for the time being, I accept , that option is really handy.

Let's see what happens after implementing the whitelist additions. I think after that we'll be able to look at things from a more clear perspective coz the breakage is going to be very less most probably and various other things will need a change (IMO less efforts coz we won't be needing them in the first place because of less errors).

Rtizer-9 commented 4 years ago

You might also wanna add "domain" parameter in coordination with /auth such that pages like

https://fpoi.gitlab.io/fpoi/media/files/How%20to%20maintain%20freedom%20and%20privacy%20using%20technology%20and%20Internet.html#toc12

don't get the auth redirected.

BTW why do I have so many "modified" rules in the preferences which are not added by me. I can also see that regex of login rules centralization - /(abp|accounts?|adfs|auth(enticat(e|ion)........ from that commit.

I installed that temporary add-on from #104 , is it because of that? (I had restarted the browser and those rules are still there)

Cimbali commented 4 years ago

Ah that might be a side-effect from the temporary add-on instead. A number of things might have happened, I would need to know a little more.

For example, are the newly “modified” rules all related to login? Does the centralised login rule appear as default or modified?

Rtizer-9 commented 4 years ago

https://ibb.co/W3zRfcR

Only saucenow, twitter and nitter have been added by me.

Cimbali commented 4 years ago

Yes, these are mostly rules from the new version. They appear as “modified” because they do not appear in the default rules of your current version.

The google/youtube history/ads pages are the opt-out pages, google docs is a login that doesn’t fit the global pattern, theguardian is a tracking parameter being removed. The wikia one probably should not be there, that’s one of mine that probably shouldn’t make it into the defaults.

Rtizer-9 commented 4 years ago

Parameters which are being redirected here might of interest to you:

https://gbhackers.com/facebook-tried-to-buy-nso-spyware/

One thing I want to say @Cimbali is that you should take my suggestions only after giving it a more thorough inspection. The parameters which I report with some doubt like that "domain" is maybe used in some legitimate cases so I'm not completely sure if it should completely be ignored that's why I specified "in coordination with /auth"(I also found "d" & "D" used in place of "domain" in some cases). The parameters which can easily be seen breaking the stuff without a doubt like those of instagram, soundcloud, twitter etc can obviously be made a easy decision upon. So please keep checking them first on your machine as well.
I also wanna ask, would you like these parameter issue's in a single megathread issue or is it ok with you, me reporting these here and there? coz I'm a heavy internet user and keep getting different parameters now and then.

Rtizer-9 commented 4 years ago

https://Invidio.us embed is broken on every page which makes use of youtube to invidio.us embed video redirection using extensions (or anything for that purpose).

The rule needed is completely similar to youtube embed.

One doubt here:

On selecting the url and clicking "whitelist embedded url" option the parameter is added to whitelisted query parameters whereas in cases like above you need to actually blacklist that parameter("origin"), exactly like youtube embed. You might wanna look into it perhaps.

Cimbali commented 4 years ago

I think ideally we could centralise #83 for these reports. Let me hide those other comments as off-topic.

Cimbali / CleanLinks

Can redirecting/cleaning ad-blocked requests be avoided? #106

Adblocker querying