-
I am getting the following error when the crawler starts
> ERROR [XMLConfigurationUtil$LogErrorHandler] (XML Validation) SQLCommitter: cvc-complex-type.2.4.a: Invalid content was found starting wit…
-
Hi Pascal,
I am seeing a lot of duplicates being processed. The number after the colon shows how many times i see it committed. This is all happening in the same run. How can I prevent this from ha…
-
Hi,
I am seeing this error but my context field is set as text_general.
1) Any suggestion what I could do to fix the issue?
2) I understand the issue should be fixed but any way to ignore suc…
-
I have Apache Solr7.2 installed in SolrCloud mode. I have setup 4 collections, to index html pages, images & videos in separate collection. I was using Norconex to index the content from the web site…
-
-
Is there any sample Java code/project available to call the crawler and solr committer? We are developing a Java application and we would like to call the norconex collector and committer from our j…
-
Hi Pascal,
it is so much fun to work with the Norconex Collector, thank you.
Well documented and easy fast results.
Is there a way to Split a metadata-attribute with multiple values into multip…
-
I have 2 jobs, A and B. I already know B will fail during indexing (to Solr). When B inevitably fails, A fails as well, provided, A finishes _after_ B.
If I run A by itself, it completes successfu…
-
I have the following requirement:
1. To Crawl all the pages in a given URL , not the entire domain
Example : http://www.paihotels.com/the-president-hotel-jayanagar-bangalore/
The crawler should …
-
I've faced a strange issue. I am trying to add crawled urls to MongoDB and commit the data to Elasticsearch at the same time. When I run without `crawlDataStoreFactory` config setup, it works fine.
…