EFForg / privacybadger

Privacy Badger is a browser extension that automatically learns to block invisible trackers.
https://privacybadger.org
Other
3.19k stars 386 forks source link

Detect and block tracking pixels #794

Open pritha90 opened 8 years ago

pritha90 commented 8 years ago

Identify and block the presence of tracking pixels in emails opened using the browser.

n-sayenko commented 7 years ago

Can PB just blacklist known email tracker domains? Or should this be a toggle feature with a button and some indicator that it is working in emails? I found a list of trackers and am thinking you can stop 1x1 img from downloading if it comes from that domain?

Anyway, I want to work on this, but don't know which way PB wants to go re: blacklist vs visible toggle button

ghostwords commented 7 years ago

As a matter of principle, Privacy Badger must not maintain blacklists. This makes our job harder.

This feature would probably be a new kind of tracking heuristic where Privacy Badger would decide whether certain image resources were tracking pixels (tiny image dimensions with a non-trivial query string?) and then learn to block the domain (probably) the pixels are served from. This might be the same task as https://github.com/EFForg/privacybadger/issues/367#issuecomment-295314135.

ghostwords commented 7 years ago

Related: #1635.

troy-lamerton commented 6 years ago

I'm going to have a go at implementing this.

@ghostwords

non-trivial query string

What decides if a query string is non-trivial?

Possible conditions:

ghostwords commented 6 years ago

Since there is so much overlap between #2088 and this, should we put a hold on this issue for now? @bcyphers already opened a PR.

What decides if a query string is non-trivial?

I was thinking of reusing either our HTML5 local storage entropy estimation function or the one for cookies.

The main technical obstacle with general-purpose pixel detection seems to be that there doesn't seem to be a way to get image dimensions from inside chrome.webRequest listeners. That might be fine, we could just go by querystring contents. I think #2088 is about taking querystring values and comparing them to known tracking values (retrieved from cookies).

bcyphers commented 6 years ago

Yeah, this is tough. I think we need to update the entropy function (per #2088), and after that it might be worth applying the same kind of heuristic to email pixels.

@ghostwords I think querystring contents alone are fine; a tracker is a tracker whether it's attached to an invisible pixel or a banner graphic. To be less invasive, we could just try stripping off high-entropy query params from image requests in emails.

ghostwords commented 5 years ago

Relevant study: "I never signed up for this! Privacy implications of email tracking" (from Proceedings on Privacy Enhancing Technologies 2018)