SocialHarvest / harvester

The Social Harvest server that exposes an API and harvests data from the web to be analyzed.
Other
112 stars 43 forks source link

Resume harvest on failure #41

Open tmaiaroto opened 10 years ago

tmaiaroto commented 10 years ago

The harvester will need to be able to resume on failure. If a server restarts for whatever reason and the harvest didn't finish, it should restart the harvest the next time the harvester starts.

Given the harvester would need to come back online within a few minutes and not all data could ever be captured anyway...It's still something we need. Otherwise, if the schedule is every hour to harvest - a lot could be missed.

The tricky part is reading the last harvest time from the database is fine and all, but if there's multiple harvesters...But that's still a problem that needs to be resolved anyway. Multiple harvesters might only be good for harvesting more territories - not so much the same territory. The reason is mainly due to rate limiting. Though staggering them might help.

Either way, the harvester should recover and pick things back up after a failure.

tmaiaroto commented 9 years ago

Design the microservice architecture in mind with this. It can be more fault tolerant and limit failures.