MrDiggles2 / cru-scrape

Scraper of CRU sites
0 stars 0 forks source link

Update spider to skip URLs if there's already content stored in DB #23

Closed MrDiggles2 closed 1 month ago

MrDiggles2 commented 1 month ago

In the case of rerunning a inprogress job, it'll "forget" about already visited sites and start overwriting existing rows and crawling everything again. We should update the spider so that it checks the database to see if a site has already been visited before vistiting it again.