EFForg / badger-sett

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.
https://www.eff.org/badger-pretraining
MIT License
119 stars 13 forks source link

Refactor crawler and add "survey" mode #24

Closed bcyphers closed 6 years ago

bcyphers commented 6 years ago

Refactor crawler.py to be a stateful object with methods rather than passing driver around between functions.

In addition, this PR integrates the changes from the https://github.com/EFForg/badger-sett/tree/rich-pb-snitch-map branch as a subclass of the Crawler object. This branch allows badger-sett to work with the rich reporting branch in Privacy Badger (https://github.com/EFForg/privacybadger/tree/store-tracking-type-in-snitch-map), which saves info about all the trackers PB comes across and doesn't block anything. This is useful for testing out new heuristics and for measuring the prevalence of certain kinds of trackers around the web.

To use the survey feature, run

$ PB_BRANCH=store-tracking-type-in-snitch-map ./runscan.sh --survey