Tainted domains are a potential privacy issue

Gitoffthelawn commented 8 years ago

Decentraleyes stores data in store.json in a subfolder in the user's Firefox profile. This data contains a list of some of the domains the user has visited (tainted domains).

This data file is not cleared when the user clears their browsing history. This represents a privacy issue and possibly a security issue.

Synzvato commented 8 years ago

@Gitoffthelawn Thanks a lot for reporting this! I can definitely see how this can affect overall end-user privacy and think we should implement a solid fix as soon as possible.

The thing is though, even if we were to hash the domain names in question, it would still be possible for bad actors to create a list of popular (or incriminating) domain hashes to test against. What would you personally suggest? We could implement a button in preferences that allows users to manually clear the list of tainted domains, or one that allows users to turn off the detection mechanism entirely.

It's assumed that all of these issues will disappear once the add-on is ported to WebExtensions, since planned changes on Mozilla's end are expected to solve the underlying page breakage problem.

Gitoffthelawn commented 8 years ago

You're welcome!

I need to think a little more about possible solutions. I agree that hashing is probably futile. Also, in some ways, it's a step in the wrong direction, because it would have been more difficult to catch this issue. :-) For this issue, hashing would sort of be akin to security by obscurity.

I'm still trying to wrap my head around the whole tainted domains thing. Based on your excellent description in https://github.com/Synzvato/decentraleyes/commit/07cfc9c2515defcfa7582c45ee5c531cfc1d9d04 , I understand how you are using that term.

But why is Decentraleyes storing all the tainted domains a user visits?

I'm guessing it's to speed up something the next time the same domain is fetched. Is my guess correct? (Sorry, not enough time to walk through all the source code right now!)

If this guess is correct, how much time is actually being saved? If this guess is incorrect, what's the purpose of storing them?

Ideally, until WebExtensions is implemented, you could hook the browser's history clearer so you can also clear the tainted domain list at the same time. I think that will cover all usage cases (clearing during a browser session and automatically clearing when the browser closes).

If that's too complicated, just clearing them when the browser session ends at least handles the most important usage case.

I believe there was some talk before if Decentraleyes can open up some fingerprinting possibilities. I can imagine some similar issues with having the tainted domains list... but if WebExtensions allows you to get rid of the tainted domains list, then there's probably higher priorities for you. :-)

Synzvato commented 8 years ago

For this issue, hashing would sort of be akin to security by obscurity.

I do tend to agree with you there, although it could offer some useful protection against untargeted attacks such as system-wide searches for specific domain names. I think this could be beneficial, since it's already a known issue, but perhaps in combination with other improvements?

But why is Decentraleyes storing all the tainted domains a user visits?

This needs to happen because, fairly regularly, the detection mechanism identifies the taint after the resource has been loaded. This means that the website in question will only be broken at the time of the very first page view, as long as we remember the fact that the domain in question is tainted.

Clearing the list of tainted domains when the browsing session ends, would cause the same pages to break over and over again. All of this added complexity is leaving its marks, so I really do hope that the underlying situation will soon be addressed. I'm keeping my fingers crossed there.

I was quite negatively surprised when I found out that currently, add-ons are not allowed to override page specific Content Security Policies, which goes against W3C's HTML Design Principles^[1].

The referenced Bugzilla bug ^[1] describes the underlying issue. I have weighed in on the discussion, and hope I will be able to provide some requested Firefox test cases soon. I will first need to read a bit more into writing Firefox tests, and on how to contribute them to the Firefox project.

Ideally [...] you could hook the browser's history clearer [...].

That's a great suggestion. It would be only fair to remove this data when a user attempts to clear traces of visited websites. We should also make sure not to store any taint details when the user is in private browsing mode. As you can see though, this will force even more complexity into the initial bugfix.

^[1] Bug 1256122 - webRequest.onHeadersReceived no redirect to extension page

Gitoffthelawn commented 8 years ago

Great info you provided. Thank you!

Since the taint is sometimes discovered after the first time the page is loaded, does that mean the page may fail on the first load?

Synzvato commented 8 years ago

Great info you provided. Thank you!

No worries, thanks for your helpful contributions!

Since the taint is sometimes discovered after the first time the page is loaded, does that mean the page may fail on the first load?

Yes, if the resource is injected before the taint detector writes the entry to the list, and if the resource requested by the domain is locally available, Decentraleyes will attempt to inject the file and this will result in the absence of said library. Any page functionality that depends on the resource will break.

Luckily, since these security policies are not too widespread, it does not affect users that much (yet). In the case that the tainter is too late, a resource cannot be injected, essential functionality breaks, and the user notices something is up, the first thing that will happen (instinctively) is a page refresh.

So that's why this approach was taken to work around the existing issues, but I think it's very important to fix this before it starts degrading the user experience, and I very much appreciate your involvement.

Gitoffthelawn commented 8 years ago

You're welcome. Your comments are interesting.

When, due to this CSP side-effect, a resource that is supposed to be loaded is not, is it possible for Decentraleyes to perform a behind the scenes refresh?

In other words, preferably before the browser has fully rendered the page, perform an automatic refresh so the page is loaded in accordance with the user's Decentraleyes preferences? Actually, this would only need to be done when the user has specified that they want unavailable resources to still load from the CDN.

Synzvato commented 8 years ago

When, due to this CSP side-effect, a resource that is supposed to be loaded is not, is it possible for Decentraleyes to perform a behind the scenes refresh?

I think that's pretty much impossible due to the fact that, as far as I know, CSP errors cannot be detected (not even by add-ons). Checking if the required library is loaded is not an option, because the missing resource in question could of course be blocked for all sorts of other reasons.

Gitoffthelawn commented 8 years ago

Okay, I'm confused! Since the CSP error cannot be detected, how does Decentraleyes know the error occurred?

Synzvato commented 8 years ago

Since the CSP error cannot be detected, how does Decentraleyes know the error occurred?

Hehe, this thread is making me relive all of the painstaking episodes I went through when analyzing the possibilities. A very good question though! Since CSP errors are undetectable, I decided to create a new module called Load Watcher to detect conditions that can potentially trigger CSP errors.

Gitoffthelawn commented 8 years ago

LOL. Sorry for dragging you through the mud again!

Is Load Watcher called before or after the page is finished loading?

Synzvato commented 8 years ago

LOL. Sorry for dragging you through the mud again!

Haha, do what must be done.

Is Load Watcher called before or after the page is finished loading?

The watcher kicks in on shouldLoad(), before the browser starts loading the requested resource.

Synzvato / decentraleyes

Tainted domains are a potential privacy issue #86