Closed tmo-trustpilot closed 5 years ago
Pinging @elastic/es-search
Hi @tmo-trustpilot, thanks for opening an issue. The elasticsearch PassageFormatter trims whitespace from the edges of snippets, to prevent results like is <b>great</b> for search
with a leading space being returned. We should probably reject pre-tag
and post-tag
values that would get removed by this. As a workaround, you can use different tags and then replace them in the client if you need to use /r
.
Thanks, yeah that makes sense. I wouldn't expect that to apply when number_of_fragments
is set to zero which returns the whole field though, or in the case of non-printing characters like '\x07'.
We will use another replacement string in the mean time.
Closing this issue, as there is a workaround for this
Elasticsearch version Version: 6.3.2, Build: default/tar/053779d/2018-07-20T05:20:23.451332Z, JVM: 10.0.2
Plugins installed: ingest-geoip:6.3.2 ingest-user-agent:6.3.2
JVM version (
java -version
): Not sure, I'm using the docker imagedocker.elastic.co/elasticsearch/elasticsearch:6.3.2
to reproduce this but java isn't in the $PATH on that.OS version (
uname -a
if on a Unix-like system): Linux 77b0d7cbec64 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/LinuxDescription of the problem including expected versus actual behavior:
Filtering for a search string and using highlight with either a non-printing character (eg. '\a') or a whitespace character ('\r' or ' ') will not include the first highlight tag if the matching text is at the start of the string. I believe it also occurs with the closing tag if the result is the end of the search string.
This leads to an unmatched closing tag in the search results. I expect that the starting tag of the highlighting should be included in the highlight result.
In the example below I have
number_of_fragments
of 0 but it also occurs withfragment_size
set instead. In our use case we're using '\a' from the python client as our delimiter which has the same effect. I can't work out how to escape that properly in CURL for reproducing, but the same thing is happening with '\r'.Steps to reproduce:
This script will reproduce the issue it will delete an index called
sample_index
if you run it. It shows:Provide logs (if relevant): Nothing interesting shows up