dreisman / WebCensusNotebook

4 stars 1 forks source link

Need versioning of public suffix list. #17

Open dreisman opened 7 years ago

dreisman commented 7 years ago

Complex public suffixes, like that for URL http://nssdata.s3-website-eu-west-1.amazonaws.com/images/galleries/10598/, might change depending on the version of the TLD list you use. We might need to keep a cached, consistent version of the TLD list.

dreisman commented 7 years ago

The notebook now uses a cached version of the public suffix list that was used when I created the database. In the future, we'll need a more principled approach to versioning the public suffix list based on when public suffix information was written to the crawl database.. I'll leave this issue open for now.