-
How do I open mapdb files?
-
Loading documents into Solr with HTTP Collector, due to an issue, the computer restarted. Just to be sure, what is the offical advice to continue the process where HTTP Collector was interrupted?.
Aft…
-
After running a crawler with `3` and just one URL, I have analysed the log and noticed that several URL are processed several times via the events: `DOCUMENT_FETCHED, CREATED_ROBOTS_META, URLS_EXTRAC…
-
Make sure Solr Committer works with Solr 5 and update Solr dependencies accordingly.
As described in issue: https://github.com/Norconex/collector-http/issues/57, it seems Solrj falls back to sendin…
-
-
From @csaezl, originally posted on https://github.com/Norconex/collector-http/issues/74#issuecomment-90225426:
> Talking again about /update parameters, is a way of passing update.chain=langid to So…
-
While running HTTP Collector on a non-Norconex site, after collecting some thousand documents commited to Solr, I had to interrupt it. I interrupted the run closing the DOS box.
After that, every new …
-
Almost all documents crawled by HTTP Collector have information about its language, but some PDF, DOC, etc may not have metadata because the authors don't register such type of information.
In this ca…
-
This issue is related to issue #69. I'm trying to get better crawling performance via decreasing `delay` parameter and increasing `threads` parameter. Combined with the need of resuming a crawling run…
-
Since it is not unusual that such types of files don't have title, author, subject, etc., I'm wondering if there is a way of capturing about (say) 100 characters or so from the beginning of the docume…