Closed BorisGuenther closed 7 years ago
First: I tested your config and the importer section was invoked as it should. If you expect results that are not happening, maybe it is your regular expressions that are not matching properly? Do you have a URL you can share with a specific use case to reproduce?
Second: Your error comes from Solr. You are submitting a value that is too long for a Solr field of type "string". Make it a text field instead may resolve the issue.
Good morning Pascal,
thank you for your fast reply. One night shift later I could 95% solve the problems.
I could reduce the error by striping my content. But I´ll try your solution additionally as it seems to solve the problem better.
It was a misunderstanding of mine. The problem ist that I wanted to strip content by HTML comments. If I do it pre parsing the meta tags are also gone... If I do it after parsing the comments I wanted to rely on are gone as the content is converted to plaintext.
I fixed it with an two-step-soultion:
Strip around the plaintext comments
Maybe you can give some feedback about my solution?
If only sleeping could resolve all problems! :-)
What if you strip what is before your opening tag, but keep the header. Have you tried something like this?
<preParseHandlers>
<transformer class="com.norconex.importer.handler.transformer.impl.StripBetweenTransformer" inclusive="true">
<stripBetween>
<start><![CDATA[<body ]]></start>
<end><![CDATA[<!--TYPO3SEARCH_begin-->]]></end>
</stripBetween>
</transformer>
<transformer class="com.norconex.importer.handler.transformer.impl.StripAfterTransformer" inclusive="true">
<stripAfterRegex><![CDATA[TYPO3SEARCH_end]]></stripAfterRegex>
</transformer>
</preParseHandlers>
Closing for lack of feedback.
I´d like to setup a crawler to feed my Solr instances.
This is my setup:
Configuration
Solr
https://github.com/Norconex/committer-solr/tree/master/norconex-committer-solr/src/test/java/com/norconex/committer/solr and Additional dynamic field:
Norconex
Issues:
First:
The importer settings are completely ignored. If I setup an unknown tagger class it throws an error - so I guess it takes care of the config but it does not really process it.
Second:
The import into Solr fails with the message below.
Thank you in advance for your help.
BR Boris