cliqz-oss / privacy-bot

Privacy Bot gathers, persists and analyzes privacy policies. #Mozilla Global Sprint Project
https://cliqz-oss.github.io/privacy-bot/
GNU Affero General Public License v3.0
39 stars 16 forks source link

Optimize url extraction in find_policies #41

Closed remusao closed 7 years ago

remusao commented 7 years ago

This makes url extraction ~40x faster than the BeautifulSoup method. It should allow us to scale policy finding to more domains.

ecnmst commented 7 years ago

@remusao Very neat. We may need for readability to move the parsing code in a separate module (or utils).