-
hi, i'm trying to crawl with an infinite depth on my site's domain + any direct out of domain links on the page, but no further.
i found this bit on a previous question where it includes the out o…
-
hi there
I'm trying to change the default dataStoreEngine to use Mongo Db, and with the same config on the default DataStore my configuration file work just fine, however when I change the Data Sto…
-
We have different crawlers running in a collector. When trying to finish a crawler run (just after printing the execution summary), sometimes the following situation occurs:
Crawler A (here Additiona…
-
Hello!
From this start URL :
https://eur-lex.europa.eu/search.html?textScope0=ti&lang=en&SUBDOM_INIT=ALL_ALL&DTS_DOM=ALL&type=advanced&DTS_SUBDOM=ALL_ALL&qid=1653030108454&andText0=plastic%3F&sort…
-
Hi!
Using WebDriverHttpFetcher, importer crashes when parsing PDF files (it's ok with GenericHttpFetcher), due to a Tika exception:
```
Caused by: org.apache.tika.exception.TikaException: TIKA-1…
-
@essiembre
Hi Pascal
Norconex version: 2.9.0-SNAPSHOT
We have encountered an issue where new additions to robots.txt file are not honored by Norconex crawler. The new disallows are not being…
-
I am experiencing a strange problem with the HTTP Collector v3 RC1, which could be a bug.
This is an example config based on the minimal setup included in the examples folder of Norconex v3 RC1:
…
-
Hi,
I don't get the ReplaceTransformer to work in the Norconex 3.0.0-SNAPSHOT (2021-12-20). Either I am missing something in the configuration or it just does not have any effect on the content fie…
-
hello Pascal,
we encountered another issue with the metadata fetcher: some websites, which do not support `HEAD` request, not only reply `Bad Request`, but add a `Location` HTTP response header to …
-
A [second milestone release](https://opensource.norconex.com/collectors/http/download) of the HTTP Collector was just made. One of the most significant features comes from the [Importer](https://opens…