Closed aleha84 closed 6 years ago
On your elasticsearch committer, have you tried setting <fixBadIds>true</fixBadIds>
(available since Committer version 4.1.0)?
It will truncate what will go in Elasticsearch ID field and append a hash code to keep it unique, but you can store/copy your intact URL in another field beforehand (e.g. with CopyTagger
of the Importer module).
If you really want to eliminate them, you can look at using a RegexReferenceFilter
and filter out long URL with a regex like this (untested):
^.{512,}$
^.{512,}$
looks like it works
Thanks for confirming.
Using Norconex HTTP Collector + Elasticsearch commiter.
And begin receiving this exception:
This is ES 5.0 hardcoded limitation. https://discuss.elastic.co/t/maximum-length-of-a-specified-document-id/4262/2
How can i ignore urls longer then 512 letters?