ClearURLs / Addon

ClearURLs is an add-on based on the new WebExtensions technology and will automatically remove tracking elements from URLs to help protect your privacy.
http://docs.clearurls.xyz
GNU Lesser General Public License v3.0
4.04k stars 111 forks source link

upload not working because etag is blocked by this extensions #177

Open liangxiwei opened 2 years ago

liangxiwei commented 2 years ago

because the upload use etag to support multipartUpload, but this extensions block etag!

KevinRoebert commented 2 years ago

Which side is broken? Do you have an example URL?

liangxiwei commented 2 years ago

Our product is a notion like but only for chinese. We use ali-oss sdk to upload file. And the sdk will use etag for multipartUpload. Here is the sdk source link:

https://github.com/ali-sdk/ali-oss/blob/HEAD/lib/common/multipart.js#L250

liangxiwei commented 2 years ago

maybe I need check if error is occur, use single part upload to instead of multipartUpload

MythicManiac commented 2 years ago

Chiming in here, S3 based multi-part file upload APIs are not usable if ETag headers are stripped from responses. Additionally using ETag for cache validation is a pretty common and reasonable use case, which also wouldn't work.

We also use an S3 compatible object storage, meaning uploads from our users with this extension don't work. Most of the users I've talked to haven't realized ETag headers are being stripped by this extension at all as the name nor the headline really implies it to be a tracker removal tool, but instead an URL clearing tool.

I'd suggest making it more clear that some valid use cases (such as s3 protocol based file uploads) don't work with the ETag stripping option enabled. There are several object storage providers that use the S3 protocol, so I'd imagine broken direct uploads to them being a fairly common issue.

brianhelba commented 2 years ago

@liangxiwei @MythicManiac I have the same problem as you: my Javascript application couldn't read ETag headers in a PUT response from S3, despite having the correct Access-Control-Expose-Headers CORS header set.

I hope that #214 will be accepted as a way to allow our use case, while still preventing ETag-based tracking.

MythicManiac commented 2 years ago

@KevinRoebert was this closed on purpose, or was the issue perhaps closed due to an automatic trigger from https://github.com/ClearURLs/Addon/commit/783f1fc99ad2e32d692be0a5626f1184e84fdc20?

It might be worth keeping the issue open as the problem exists, even if the proposed solution isn't viable.

KevinRoebert commented 2 years ago

@KevinRoebert was this closed on purpose, or was the issue perhaps closed due to an automatic trigger from https://github.com/ClearURLs/Addon/commit/783f1fc99ad2e32d692be0a5626f1184e84fdc20?

It was the automatic trigger. Re-opened

t3dotgg commented 10 months ago

fwiw, this is an absurd behavior that sent us down a fun debugging spiral on UploadThing. Absurd enough I didn't actually believe it at first. Super unintuitive that a "clear url" extension entirely breaks etag, and as such almost all implementations of multi-part upload

vasilvestre commented 10 months ago

Also Etag is part of cache strategy for web, it's not intrusive at all

proevilz commented 10 months ago

Also Etag is part of cache strategy for web, it's not intrusive at all

The problem is that it can be used to track users, and its probably more common than we realise https://levelup.gitconnected.com/no-cookies-no-problem-using-etags-for-user-tracking-3e745544176b https://www.secjuice.com/etag-entity-tag-tracking/

vasilvestre commented 9 months ago

Also Etag is part of cache strategy for web, it's not intrusive at all

The problem is that it can be used to track users, and its probably more common than we realise https://levelup.gitconnected.com/no-cookies-no-problem-using-etags-for-user-tracking-3e745544176b https://www.secjuice.com/etag-entity-tag-tracking/

I'm not sure how the extension works but an opt-in would be the best option IMHO. If people want privacy they can opt-in and you can warm users that it may break some websites. Wdyt ?

MythicManiac commented 9 months ago

The privacy view is valid, but clearly something is wrong if this continues being a common issue for valid use cases of ETag headers. In our case we ended up including a special case error handling for missing ETag headers which instructs our users to disable the ClearURLs addon, but it's absurd we had to go to those lengths.

To my understanding the ETag issue with privacy tracking is that the browser will automatically send ETag headers on outgoing requests, not that it's receiving them on inbound requests. Would it be possible to modify the feature to focus on stripping the outgoing request headers rather than the incoming ones? It would still break caching, but at least it shouldn't break S3-compatible uploads.

KevinRoebert commented 9 months ago

Hello,

To be honest, I'm not quite sure how to better implement ETag Filtering.

For instance, Privacy Possum checks whether an ETag changes upon reloading a resource and then blocks it. However, this method requires storing every visited URL, including its ETag, in a cache (LRU), leading to significantly higher RAM usage by the addon. Moreover, the entire idea relies on the assumption that the (tracking) ETag will change after a subsequent request.

Another implementation was proposed for Privacy Badger, which verifies whether the ETag was correctly generated according to the algorithm of nginx or Apache. If not, it would be blocked. However, this method would only work for Apache and nginx servers. Additionally, it might end up blocking various elements if the server sets the ETag in a different manner.

Another option could be to completely remove ETag Filtering from ClearURLs and rely on the "Network Isolation Key" for cache strategies in Chrome 96 and Firefox 85. This approach aims to prevent tracking across multiple sites. However, it won't prevent recognition after a session on the same site.

I'm open to suggestions from the community.

By the way, the ETag Filtering has been disabled by default since version 1.25.0 (2022-07-27).

220 #321

MythicManiac commented 9 months ago

This does seem like a rather complicated issue to solve, impossible even without accepting some tradeoffs. That being the case, would it make more sense to approach this by adding exceptions to the filtering for known valid use cases (such as s3 multi-part uploads)?

I'd also be interested in the philosophy for this filtering in the case of requests sent from javascript; does a scenario exist where the javascript which sends the request and has access to the response could not simply use some other means of tracking, such as encoding the tracker in the response payload directly rather than headers?

Or in a bit simpler terms, does a scenario exist where filtering out ETag headers for requests created by javascript that's already being executed in the browser realistically makes a difference?