-
Using the dom splitter i extract ```` tags as new documents
```xml
```
and then i capture the attributes
```xml
text/x-php
text/plain
…
-
Installed HTTP Collector and Neo4j committer, then changed committer of example minimum test to neo4j committer. It gives following error:
INFO [AbstractFileQueueCommitter] Committing 1 files
Oct…
-
We found a issue related to the dependencies which is the committer plugin installing into the lib folder. The issues causes a behavior where meta tags are not getting extracted properly.
The beha…
-
Hello Pascal,
I am seeing multiple entries for the exact same URLs and each time I re-index the contents the crawler adds the same entry one more time. Please see the examples below:
"_langu…
-
Hello,
Recently one of the internal sites I crawl and index-to Solr has changed implementation, and I cannot seem to get the Norconex stack cfg working as-desired. Specifically, the site's pages ar…
-
Hi Pascal,
we have a cronjob which is starting the crawling in a scheduled manner.
Sometimes we get the following error and the crawler does not start the next scheduled crawl run.
I found th…
-
Hi, I'm trying to crawl a page and commit only urls that have the word "colleague" in them. I tried adding the following in the minimum-config file:
http://.*/Colleagu…
-
I am crawling a website with https. And it seems the ssl cannot support....
I am using java version "1.8.0_202" and Norconex http 2.9.0 snapshot
below is the config.
```
./outp…
-
Running into an error commiting to elastic. I assume this "_id" from kibana which appears to be the url of the page, & same as "document.reference"
![screen shot 2017-11-17 at 15 04 33](https://us…
-
There is a strange behavior;
the file which with that URL hadn't been catched
http://datasheets.avx.com/TCJ.pdf
even it exists in the URL
http://www.avx.com/awards/finalist-for-ubm-techs-ee-times-an…