Letractively / harvestman-crawler

Automatically exported from code.google.com/p/harvestman-crawler
0 stars 0 forks source link

Design the crawler to run non-stop #16

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Current behaviour:

- projects are executed sequentially.
- once all executed the crawling stops.

Desired behaviour:

The crawler must be able to run non stop:
1. low memory consumption fluctuation.
2. run multiple projects in parallel.
3. read the configuration file on constant intervals to change dynamically
the settings: adding/removing of projects/other settings like bandwidth,
depth etc.
4. trigger re-crawling of pages/projects based on rss triggers. 

Original issue reported on code.google.com by andrei.p...@gmail.com on 17 Jul 2008 at 7:54

GoogleCodeExporter commented 8 years ago
I think this is similar to #14 in terms of (2). Item 4 is already part of the 
RSS
support bug and memory consumption is addressed in another bug. So this is sort 
of a
duplicate.

Andrei, thanks a lot for your help in reporting issues. But I am trying to make 
some
sense of all the bugs reported here and to reduce them to a workable set for 
2.0. So
I am marking this as duplicate. Let me know if you differ with this opinion.

Thanks!

Original comment by abpil...@gmail.com on 6 Oct 2008 at 11:34