-
My Crawler Name: 2016-04-20 14:41:50 ERROR - My Crawler Name: Could not mark reference as processed: URL (can't serialize class com.norconex.commons.lang.file.ContentType)
java.lang.IllegalArgumentExc…
-
Dear Dev Team
I'm new to Norconex HTTP Collector, and will use it for my thesis. The scenario for my crawling is:
1. i provide the crawler with list of URLs in a text file
2. the crawler will crawl t…
-
Some weird stuff happens when I crawl more than 1000 urls.
Originally, I set it up with 440,000 urls , single crawler. Started it. But no INFO messages appear like "DOCUMENT_IMPORTED" or "REJECTED_FIL…
-
I'm using the Norconex HTTP Collector to crawl HTML files and send certain meta-fields and the content text to a Solr server.
What I now want to do is to only send text from the content to the Solr s…
-
This ticket originated from https://github.com/Norconex/committer-elasticsearch/issues/3#issuecomment-191226573.
Because of `AbstractBatchCommitter` calling commit() for every batch, this eliminates …
-
We already have most of issues about messy code resolved, but still one remaining, here is my configuration.
```
./www.hngzzx.com/progress
./www.hngzzx.com/logs
…
-
I believe the path "./" in the url has to be threaded as "/" , i.e. , remove the dot "." . Because otherwise the crawler can go into infinite loop under specific conditions, just like happens when cra…
-
From @bruce-genhot at https://github.com/Norconex/collector-http/issues/190#issuecomment-161634436:
> The elasticsearch committer library only works with elasticsearch 1.5, the latest version is 2.1…
-
Hi,
is there anyway to collect on wordpress pages? i used the minimal xml file without results.
THX
-
The crawler does not seem to turn URLs including `/../` segments (and probably also including `/./` segments) into absolute / normalized URLs. This leads to duplication in the queue and committer sink…
niels updated
8 years ago