-
Hello I am setting up norconex with the Cloud Search plugin,
I have had some success with a small test, now attempting to setup for a full site.
standard tagging is working and gets much of the …
-
@LeMoussel @essiembre Thanks, I would be interested to see that as I might have to write a committer myself, as I have to find a way to send crawled docs to temporary storage for further processing wh…
-
Hi Pascal,
I have a requirement of excluding all URLs that were indexed before a certain date , I need to exclude them next crawl onward.
Now I have an additional situation as well. I added Curren…
-
Hello Pascal,
there are some errors from time to time while the http crawler is running:
```
Exception in thread "StreamConsumer-STDOUT" java.util.ConcurrentModificationException
at ja…
-
Hi Pascal,
First of, thank you for the excellent software.
I want to crawl a very large site (10M+ pages) and i want to avoid all the search query links (containing ?, multiple keywords) and all…
-
Running the lastest 3.0.0 M1 with elasticsearch 5.0.0 m1
Per the doc, it seems like typename should be there: https://opensource.norconex.com/committers/elasticsearch/v4/configuration
But it ma…
-
Does the collector add metadata (title, keywords, etc.) it find from the sitemap.xml itself or just metadata it finds inside the document itself?
-
Hi,
We have noticed that sometimes Norconex committer fails to index few documents for any reason. These failures cannot be communicated back to the crawl which updates the checksum and will not be p…
-
Hi There
i using Version 3 (for testing) with Google chrome as http fetcher.
All works fine but when documents exceeded 2097152 bytes i got errors (full log see below)
`io.netty.handler.codec.…
-
The Crawler doesn't seem to be pulling content for one of my sites. I can see every other field in the data but not the content. The only material difference between this config and my other working…