-
I am trying to setup the Norconex collector to push to Solr. I have it connected to and populating Elasticsearch correctly, but I really need Solr working. The crawl will complete,but then error out o…
-
Using stripBetween transformer to delete headers and footers from documents in preParseHandlers
For most documents all is fine, but for some specified pages footer is not removed. On all pages markup…
-
hi there Pascal
I am having an issue with the imported and field returns from an external content, all the HTML returned is format in every line with "\n" and my custom field is returning not a singl…
-
Hi,
I ran into an issue with what I believe are large files that cause listener timeout errors when committing to ES. I believe the ES java client defaults to 30 seconds before giving up which is …
-
Hello, when running the crawler with multiple threads, I get the following error:
```
INFO [AbstractCrawler] WM Search: 100% completed (23343 processed/23343 total)
INFO [AbstractCrawler] WM …
-
Testing your products, so far crawler works and was much easier to follow than Nutch or Stormcrawler.
I am sending to elastic 5.6 and the data is there. However, I am a bit confused on the fields …
-
Simple question... Is there a way to prevent the startURL from being submitted to the index? Thanks in advance.
I thought maybe I could add it to the RegexReferenceFilter, but that rejects it earl…
-
I'm using Norconex crawler on facebook Graph API /events/ and it is crawling down the data, but when it commits it to the elastic kibana sees the data in one block, so it cannot "index" it.
As I kn…
-
As suggested in [DOMtagger description](https://www.norconex.com/collectors/importer/latest/apidocs/com/norconex/importer/handler/tagger/impl/DOMTagger.html) it is better (in performance purposes) to …
-
Hi, I have ~4.7M files already indexed and re-ran the crawler to see how long it would take to crawl on second attempt. The first (initial) crawl took 1 day 10 hours. The second attempt I started la…