Closed GoogleCodeExporter closed 9 years ago
Fixed it. Connector objects by default flushes data to temporary files now
instead of
keeping data in memory.
Added a control var for this in <system> named "connections".
By default this is,
<connections type="flush" />
This is equivalent to,
<connections type="0" />
This means keep flushing data to temporary files. This improves the memory
usage of
the program. To reset this to keeping data in-memory,
<connections type="mem" />
this is same as,
<connections type="1" />
Since we had an earlier element named "connections", I renamed it to
"maxconnections".
By default temporary files are saved in the folder ".tmp" in the project folder
of
the crawl project. For the time being I am not removing this folder at the end
of
crawl (for debugging), but this will be done later.
Original comment by abpil...@gmail.com
on 7 Jul 2008 at 8:05
Still the crawler hangs:
1. ps aux:
8696 46.0 31.1 898852 631948 pts/1 Sl+ 21:46 13:51 python
/usr/lib/python2.5/site-packages/harvestman/apps/harvestman.py -C
config-sample.xml
at about 30% of 2GB memory, after 30 minutes.
2. version number:
svn up:
At revision 79.
3. xml file:
xml config file contained <connections type="flush" />
4. number of tests: 2
5. ~ time from start to hanging: 30 minutes
Original comment by andrei.p...@gmail.com
on 17 Jul 2008 at 7:19
Original issue reported on code.google.com by
abpil...@gmail.com
on 25 Jun 2008 at 12:14