-
Hi Pascal,
I am using Google Cloud search Committer. And trying to index few extra fields apart from general ones(title, language, date, type ).
Using postParse handler I am extracting those field…
-
Hello,
While trying to use the 'DeleteTagger' in preParseHandlers, entering an XML regex to remove all fields with 'pdf_' does not appear to be working as expected. Going based on the [documenta…
-
Hi, I got this problem with my Collector Http 2.9.0 installation:
a) collector crawls with a 1 day delay
b) keepDownloads is false to save disk space
c) collector only crawls urls listed in a text …
-
Hello,
I have added few new metatags to my page.
Ex.
``
I want to extract the value of 'content-region' field and index as 'region'. I am committing to Google Cloud search.
If I use belo…
-
hi, i'm trying to crawl with an infinite depth on my site's domain + any direct out of domain links on the page, but no further.
i found this bit on a previous question where it includes the out o…
-
hi there
I'm trying to change the default dataStoreEngine to use Mongo Db, and with the same config on the default DataStore my configuration file work just fine, however when I change the Data Sto…
-
We have different crawlers running in a collector. When trying to finish a crawler run (just after printing the execution summary), sometimes the following situation occurs:
Crawler A (here Additiona…
-
Hello!
From this start URL :
https://eur-lex.europa.eu/search.html?textScope0=ti&lang=en&SUBDOM_INIT=ALL_ALL&DTS_DOM=ALL&type=advanced&DTS_SUBDOM=ALL_ALL&qid=1653030108454&andText0=plastic%3F&sort…
-
Hi,
We're looking to build an internal tool, where we can crawl and index our own websites using the Elasticsearch committer.
As some exploratory work I've got the collector running in Docker as…
-
@essiembre
Hi Pascal
Norconex version: 2.9.0-SNAPSHOT
We have encountered an issue where new additions to robots.txt file are not honored by Norconex crawler. The new disallows are not being…