cliqz-oss / privacy-bot

Privacy Bot gathers, persists and analyzes privacy policies. #Mozilla Global Sprint Project
https://cliqz-oss.github.io/privacy-bot/
GNU Affero General Public License v3.0
38 stars 16 forks source link

Improve Dockerfile + fetching. #39

Closed remusao closed 7 years ago

remusao commented 7 years ago

• Fix Dockerfile and add phantomJS • Optimize find_policies and fetch_policies • Re-introduce headless browser, using a pool of worker to avoid blocking • Add a few heuristics