commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Add new Malware/Phishing Blacklists #58

Open indolering opened 7 years ago

indolering commented 7 years ago

I've got some additional lists not covered by UT1 with notes on their availability. Given that the information stored with the crawl will be dated, I doubt anyone would mind us publishing the information..

SafeSearch

Google's SafeSearch is the big one, aggregating anti-phishing feeds (probably PhishTank), Malware from stopbadware.org, as well as their own list of unwanted software.

They specifically state that "All use of Safe Browsing APIs is free of charge." and their usage restrictions is strictly concerned with displaying information to users.

They offer dumps for use in local databases and provide the following contact information for large scale users: antiphish-malware-cap-req@google.com.

StopBadWare.org

Collection of domains pushing malware. They provide offer data for research purposes, they may be fine with us making it publicly available (esp if we introduce a time lag).

PhishTank.com

Collaborative phishing list, not CC licensed but I seriously doubt they would mind it if we used their information. Ping the mailing list for more info.

sylvinus commented 7 years ago

Thanks for the pointers! They all look interesting to integrate.