With an effort to add more sites to our data collection, the scraper should be redesigned to handle generic web pages with minimal processing. Any processing of the page should be post-hoc and handled in the anidata/ht-etl repo. We need the data from other sites since BackPage is now down for the adult services and also does not maintain a long history.
Tasks:
[ ] Design new methodology/architecture for generic scraping
With an effort to add more sites to our data collection, the scraper should be redesigned to handle generic web pages with minimal processing. Any processing of the page should be post-hoc and handled in the anidata/ht-etl repo. We need the data from other sites since BackPage is now down for the adult services and also does not maintain a long history.
Tasks: