Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
Related to issue #69, this is a small sample of the problem I experience concerning <queueDir>. That is, when a crawler is runnning, sometimes I see documents in <queueDir> registered some time ago, that still are there and don't exist in Solr..
The sample contains 2 documents in <queueDir> registered 5 hours ago.
To no extend too much this text, I only copy the REF file for each document and the excerpt from the log file where the documents are commited and sent to Solr.
Related to issue #69, this is a small sample of the problem I experience concerning
<queueDir>
. That is, when a crawler is runnning, sometimes I see documents in<queueDir>
registered some time ago, that still are there and don't exist in Solr.. The sample contains 2 documents in<queueDir>
registered 5 hours ago.To no extend too much this text, I only copy the REF file for each document and the excerpt from the log file where the documents are commited and sent to Solr.
Log: