Parallel crawl of projects

Letractively / harvestman-crawler

Automatically exported from code.google.com/p/harvestman-crawler

0 stars 0 forks source link

Current behaviour: Currently, the crawling of the projects is done in a sequential way: one project finishes, then start the next one. Desired behaviour: All the projects should start in parallel without having to wait for each other, eventually saving the logs separately for each one on them.

This has to wait for a later release, since the focus now is to get the other 
tasks
done. Parallel crawl of projects will need a very good, barricaded thread design
which does not mix up the child URLs of one project with the other. It is 
possible in
the current design by modifying the way URLs are pushed to the queue etc, but 
the
focus is not here right now.

Original comment by abpil...@gmail.com on 6 Oct 2008 at 11:31

Changed state: Later
Added labels: Priority-Low
Removed labels: Priority-Medium

Letractively / harvestman-crawler

Parallel crawl of projects #14