Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
I am trying to use a custom document filter in the Importer pipeline, the Pre-process document stage. My document filter implements only the IDocumentFilter interface.
When the pipeline is active, and a document goes through the pre-process filters, an error is generated and the processing for that page is stopped.
The error:
INFO - REJECTED_ERROR - REJECTED_ERROR: https://a.ro/somehtml
INFO - AbstractCrawler - homezz-crawler: Could not process document: https://a.ro/some.html (ro...filter.XDocumentFilter cannot be cast to com.norconex.importer.handler.filter.IOnMatchFilter)
The source code I can identify as the problem: .m2/repository/com/norconex/collectors/norconex-importer/2.9.0/norconex-importer-2.9.0-sources.jar!/com/norconex/importer/Importer.java:354
boolean accepted = acceptDocument(doc, filter, parsed);
if (isMatchIncludeFilter((IOnMatchFilter) h)) {
includeResolver.hasIncludes = true;
if (accepted) {
The cause: When the cast is done, there is not check if h is an instance of IOnMatchFilter
Hello,
I am trying to use a custom document filter in the Importer pipeline, the Pre-process document stage. My document filter implements only the IDocumentFilter interface. When the pipeline is active, and a document goes through the pre-process filters, an error is generated and the processing for that page is stopped. The error:
The source code I can identify as the problem: .m2/repository/com/norconex/collectors/norconex-importer/2.9.0/norconex-importer-2.9.0-sources.jar!/com/norconex/importer/Importer.java:354
The cause: When the cast is done, there is not check if
h
is an instance ofIOnMatchFilter
Can you please review this?
Thank you.