EFForg / badger-sett

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.
https://www.eff.org/badger-pretraining
MIT License
119 stars 13 forks source link

Replace Majestic Million with Tranco #49

Closed ablanathtanalba closed 4 years ago

ablanathtanalba commented 4 years ago

Fix #45. This replaces the default domains list with the Tranco List, which seems to have more integrity as a list of the most popular domains, instead of just the most referring subnets.

ablanathtanalba commented 4 years ago

As of now, this will enumerate over the entire list, rather than just the top 2000 domains like the README suggests. It would be easy to trim the list to 2000 by default, but I want to check here first to see what others think. Do we want to leave it open ended?

ghostwords commented 4 years ago

As of now, this will enumerate over the entire list, rather than just the top 2000 domains like the README suggests.

We should preserve whatever the existing behavior is. Does running crawler.py on master w/o specifying the number of domains run on 2,000 domains? Then this should still be the case in this PR.