-
hi team
in norconex http collector if i am using default mv data store in this case it is not crawling duplicate data but if i am using the same configuration but i am using mongo db in this case…
-
Hi, Pascal
I am working on a small project (for study purpose only) to identify personal data in files, on a LAN network, using File System Crawler to fetch the contents of the files and Microsoft …
-
Hi Pascal
I would like to prevent some subdirectories from being crawled. Is there any way to do this? I tried to use as below, but it scans the subdirectory and rejects the crawled files, because …
-
I wonder if there is any other migration guide unless [Migrate to version 3](https://opensource.norconex.com/crawlers/web/v3/migration)? I couldn't figure out how to migrate this from version 2.9.1 to…
-
I would like to crawl an HTML page (seed page), follow the links and index those pages 1 hop from the seed page. But I don't want to index the seed page. How can I implement it? I tried to use the Ref…
-
Hi!
First, I really like this lib! It works really great!
But now, I have an issue using the WebDriverHttpFetcher. My app is running as a Spring Boot (2.7.0) application with Java 17 on a Window…
-
Hello, I'm trying to extend the Norconex Committer system for my application (in Clojure).
I have the following code
`(def my-committer
(proxy
[LogCommitter]
[]
(doUpsert [u…
-
I'm trying to use the Norconex web crawler in a Clojure project; I've imported all the dependencies using leiningen.
When I try to create a new HttpCrawlerConfig object, I get the following error:
…
-
I am using version 3.0.1 of the Norconex collector.
When I try adding a WebDriverHttpFetcher to the crawler config, I get the following exception as soon as the crawl is started
```
Exception i…
-
Hello,
While trying to use the 'DeleteTagger' in preParseHandlers, entering an XML regex to remove all fields with 'pdf_' does not appear to be working as expected. Going based on the [documenta…