commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Advertising Lists #59

Closed indolering closed 7 years ago

indolering commented 7 years ago

FilterLists is a meta-list for advertising, spam, astroturfing, fraud, and piracy. It would be relatively easy to manually pull in many of these feeds, but some require parsing as the advertising filters often contain regular sites with CSS selectors. I've emailed the admin and asked if s/he could produce a JSON or TXT master list that we can parse.

I've also submitted PrivacyBadger's yellowlist for inclusion as well. If that isn't added for some reason, we should add it.

indolering commented 7 years ago

They are apparently working on version 2.0 of the site and are considering adding a machine-readable data file. It might take some time, however.

sylvinus commented 7 years ago

FilterLists definitely looks great, and I'm glad to know they're working on machine-readable data files. Let's continue tracking this at #34.