Deleetdk / OKCubot

A Scrapy scraper to scrape OKCupid.
28 stars 6 forks source link

Check for duplicates while scraping #4

Open onbjerg opened 8 years ago

Deleetdk commented 8 years ago

Not re-scraping users has a cost, namely that one cannot examine changes for a profile over time. This could be interesting to examine, but it would complicate the data gathering, so perhaps best to postpone that for later.

onbjerg commented 8 years ago

Using maxogden/dat would also save the historical data. It does not add duplicates, but it updates the existing row with new information, while retaining the old information in the log (much like git). This would also fix #3

Deleetdk commented 8 years ago

Sounds good. :)