-
Hi Pascal,
Can you help me to avoid the header and footer data from a page being crawled
Please find below the
[htmlfile _l2tm.txt](https://github.com/Norconex/collector-http/files/1208703/htmlf…
-
Hi,
I ran into an issue with what I believe are large files that cause listener timeout errors when committing to ES. I believe the ES java client defaults to 30 seconds before giving up which is …
-
I have a setup like this in my `robots.txt`:
```
User-agent: my_crawler
Crawl-delay: 1
User-agent: * # match all bots
Crawl-delay: 10 # per http://en.wikipedia.org/wiki/Robots.txt#Nonstandard…
-
ES committer continues to run (doesn't exit on error) but logs show the failure: Here is a snippet:
```
WM Search: 2017-09-24 18:23:23 INFO - Sending 100 commit operations to Elasticsearch.
WM S…
-
Hi,
Hi I would like to use HTTP Collector with Elsticsearch Committer but do not know how to setup elasticsearch that will receive data from Elastic Committer.
here is my config in HTTP Collecto…
-
I am trying to setup the Norconex collector to push to Solr. I have it connected to and populating Elasticsearch correctly, but I really need Solr working. The crawl will complete,but then error out o…
-
I'm attempting to resolve an error I see when doing an initial test crawl and seeing some strange behavior. First, here's the relevant parts of my config file:
https://www.myredact…
-
Hello, when running the crawler with multiple threads, I get the following error:
```
INFO [AbstractCrawler] WM Search: 100% completed (23343 processed/23343 total)
INFO [AbstractCrawler] WM …
-
Where do I find information on the fields that are available for the tagger. I.E. the fields that would go here:
```
id,title,keywords,description,content,document.reference, document.conte…
-
Hello,
I would like to be able to create a nested field in my elastic search ingested documents. ([Reference](https://github.com/Norconex/collector-filesystem/issues/15))
Rather than doing thi…