Design the crawler to run non-stop

Letractively / harvestman-crawler

Automatically exported from code.google.com/p/harvestman-crawler

0 stars 0 forks source link

Current behaviour: - projects are executed sequentially. - once all executed the crawling stops. Desired behaviour: The crawler must be able to run non stop: 1. low memory consumption fluctuation. 2. run multiple projects in parallel. 3. read the configuration file on constant intervals to change dynamically the settings: adding/removing of projects/other settings like bandwidth, depth etc. 4. trigger re-crawling of pages/projects based on rss triggers.

I think this is similar to #14 in terms of (2). Item 4 is already part of the 
RSS
support bug and memory consumption is addressed in another bug. So this is sort 
of a
duplicate.

Andrei, thanks a lot for your help in reporting issues. But I am trying to make 
some
sense of all the bugs reported here and to reduce them to a workable set for 
2.0. So
I am marking this as duplicate. Let me know if you differ with this opinion.

Thanks!

Original comment by abpil...@gmail.com on 6 Oct 2008 at 11:34

Changed state: Duplicate
Added labels: Priority-Low
Removed labels: Priority-Medium

Letractively / harvestman-crawler

Design the crawler to run non-stop #16