Closed mitchelljj closed 9 years ago
Can you please attach your config?
It may be a case where your DeleteTagger
is set before parsing occurs. Make sure to configure the DeleteTagger
in the <postParseHandlers>
section of your Importer configuration.
Alternatively, it may be simpler (and safer) to use the KeepOnlyTagger
instead (still as a post-parse handler). This way if a web site decides to add new meta data fields to their pages, they will not make it through to Solr.
Have you resolved your issue with the last suggestions I made?
Having received no feedback in a while on the latest suggestion, I am closing this, assuming it worked for you. You can reopen if need be.
I get the below Apache Solr log error: ERROR - 2015-09-26 22:18:51.914; [c:gettingstarted s:shard2 r:core_node2 x:gettingstarted_shard2_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: ERROR: [doc=http://www2.ed.gov/programs/iegpsddrap/brochure-ddra.doc] Error adding field 'tiff_BitsPerSample'='8 8 8 8' msg=For input string: "8 8 8 8" at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
I tried to get rid of the tiff_BitsPerSample field by deleting it before it is sent to Apache Solr by adding the below to the Norconex minimum-config.xml file with the following for the "tagger class="com.norconex.importer.handler.tagger.impl.DeleteTagger" tag and I even tried all lower case in addition to how the field is reported as an error to Solr:
I even stopped Solr and started again and then started the Norconex crawl but Apache Solr log file is still reporting that the tiff_BitsPerSample field is causing errors. How can I prevent this tiff_BitsPerSample field from being imported to Solr and causing these errors? Do I need to start to the very beginning and reset the Solr environment back to the starting point like I have listed below?
The following command line will stop Solr and remove the directories for each of the two nodes that the start script created: bin/solr stop -all ; rm -Rf example/cloud/ adding back the initial cloud gettingstarted environment: To launch Solr, run: bin/solr start -e cloud –noprompt