-
Simple question... Is there a way to prevent the startURL from being submitted to the index? Thanks in advance.
I thought maybe I could add it to the RegexReferenceFilter, but that rejects it earl…
-
Hi, I have ~4.7M files already indexed and re-ran the crawler to see how long it would take to crawl on second attempt. The first (initial) crawl took 1 day 10 hours. The second attempt I started la…
-
I'm using Norconex crawler on facebook Graph API /events/ and it is crawling down the data, but when it commits it to the elastic kibana sees the data in one block, so it cannot "index" it.
As I kn…
-
I'm attempting to crawl a password protected wiki that we use for internal documentation and I'm struggling with getting authentication to work. I've tried to use form authentication as well as basic…
-
Hi there,
I am trying to add a lower weight to a certain section of my webpage. To that, I have created a Field in the Collection and want to add to the Field the text from that section in the webp…
-
Hi,
We have a sitemap to be crawled and we are having some URLs in the sitemap which gives 404 error as in the snapshot.
![image](https://user-images.githubusercontent.com/29800957/33125772-d…
-
Hi there:
I am trying to use the ExternalTransformer on some documents, I create a SH file that is executable an receive a text file and transform it to something else, in the command line us workin…
-
I'd like to create what some might call content groups. In third-party search providers, like addsearch.com, they provide a method for adding a group based on the URL. For example, you might want to…
-
Hello,
I would like to be able to create a nested field in my elastic search ingested documents. ([Reference](https://github.com/Norconex/collector-filesystem/issues/15))
Rather than doing thi…
-
I have a database of URLs relevant to one or more health topic. I am indexing these existing health topics, for which I've written:
* An URL provider that returns them from a database
* A tagger …