-
Hello,
Recently one of the internal sites I crawl and index-to Solr has changed implementation, and I cannot seem to get the Norconex stack cfg working as-desired. Specifically, the site's pages ar…
-
There is a strange behavior;
the file which with that URL hadn't been catched
http://datasheets.avx.com/TCJ.pdf
even it exists in the URL
http://www.avx.com/awards/finalist-for-ubm-techs-ee-times-an…
-
I am currently having an issue with the SpoiledReferenceStrategizer behaviour. The default behaviour for BAD_STATUS is to GRACE_ONCE. My config file does not have this explicitly set in the file. Woul…
-
I am crawling a website with https. And it seems the ssl cannot support....
I am using java version "1.8.0_202" and Norconex http 2.9.0 snapshot
below is the config.
```
./outp…
-
Running into an error commiting to elastic. I assume this "_id" from kibana which appears to be the url of the page, & same as "document.reference"
![screen shot 2017-11-17 at 15 04 33](https://us…
-
Hi,
I am using MongoDB as the datastore for the crawler. I has two questions for this.
1. The crawler shows up the `duplicate key error` randomly. I am wondering if there are lots of threads to …
-
Dears,
why I still get the "handshake_failure" alert with the following crawler configuration, and Java8_172 ?
````
https://sac.formalazio.it/login.php
Mozilla/5.0 (Windows NT 6.1; …
-
I'm trying to save 'Raw HTML Source' in the Committer Class.
But Only 'text' is passed as InputStream arg of Committer Method(queueAddition).
How Can I get the raw html source code in committer?
Th…
-
hello Pascal,
I found an issue with DomSplitter, e.g.
```
```
The very first crawl works fine, I'm getting the children docs into the index, but when I start the same crawl again (no changes…
-
When trying to run the crawler on an intranet I am getting: `com.norconex.importer.parser.DocumentParserException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apach…