-
I'm using the Norconex HTTP collector (v2.8.0) and am having some issues with extracting contents from PDFs.
Here's a gist of the error: https://gist.github.com/mbockenstedt/4f521a44f21221671c64e62…
-
Hello,
I have downloaded the http collector and it works great with a core. I have a requirement to search more than one web site depending upon user's selection. I understand I need to create m…
-
Hello,
Upon attempting to commit documents to a SolrCloud cluster running behind an NGINX reverse-proxy setup to use HTTPS, I get the exceptions below. I have added the server's wildcard cert to the …
-
I am trying to crawl sitemap xml file which includes bulk urls and commit the documents to azure service. There will be more than 400 documents getting stored in the committer-queue directory.
Norcon…
-
What is this error? I see it intermittently in my logs and can't really see any rhyme or reason to it.
```
intranet-sv: 2018-11-25 13:14:05 ERROR - intranet-sv: Could not process document: https:/…
-
[MyMetadataFetcher.zip](https://github.com/Norconex/collector-filesystem/files/1655329/MyMetadataFetcher.zip)
Hi,
Is there a way to fetch meta data from external properties file. We have integrate…
-
Hi,
I keep getting the following error when committing documents to AWS CloudSearch:
```
CloudSearch: 2017-11-20 15:15:40 INFO - Sending 10 documents to AWS CloudSearch for addition/deletion.
…
-
Getting the following error when attempting to commit the items to the Azure index. Looks like it is maybe complaining about what is being put in the id field, but I don't know where this is coming f…
-
Hi Pascal,
Can you help me to avoid the header and footer data from a page being crawled
Please find below the
[htmlfile _l2tm.txt](https://github.com/Norconex/collector-http/files/1208703/htmlf…
-
This is my config.xml crawler section:
```
http://www.testsite.com
http://elasticsearch:9200/
```
But running the crawler it crash at:
```java.lang.IllegalArgumentExcepti…
mfoti updated
6 years ago