Open timroes opened 3 years ago
Pinging @elastic/es-search (Team:Search)
@timroes this is indeed surprising behavior, that the _ignored
and ignore_field_values
sections can give conflicting information. I'm wondering if we should check the _ignored
section when creating the ignore_field_values
response -- if the field is not present in _ignored
, we can omit it from ignore_field_values
. @markharwood do you have any thoughts on this idea?
In our team discussion, we also raised the question of whether we should really allow ignore_above
to be updated. Elasticsearch's behavior around ignored values would be simpler if this parameter couldn't be changed. This would be a bigger and more long-term discussion than the idea above.
Thanks for the clarification. Just to make sure I am understanding the initial thought correctly. When you say that a field should not be in ignore_field_values
when it's not also in _ignored
, that would mean for this specific case with ignore_above
having changed, that it would still appear under fields
, since the value actually is indexed?
Yes, in this case the value would be included in fields
instead of ignored_field_values
.
do you have any thoughts on this idea?
TL/DR: We can only work with the current rules
Reverse-engineering why certain values weren't ingested is hard. All we have is a list of ignored field names and JSON source with potentially many values in arrays (some good values, some bad).Figuring out which of the values might have been rejected and why is hard to know, especially if you are allowed to change the mapping rules after ingest. All we have to work with are the current validation rules and the source.
The support for ignored_field_values
simply hooked into the existing try....catch...ignore
section of code in the fields api where it retrieves and parses values from source at query time. We just picked up the values that were otherwise being silently dropped in the ignore
part of their exception handling and added the bad values to the results in the new ignored_field_values
section.
Pinging @elastic/es-search-foundations (Team:Search Foundations)
While testing something I found a potential weird behavior in the field API that I wanted to clarify on whether this is intended behavior or not.
Create a simple index & index a document:
If you request data from that index using the following query the result looks like expected:
Now change the
ignore_above
setting of this field:Executing the same
_search
as above, will now return a different result:It seems that the
name
value for this document, since above 5, will now no longer be returned from thefields
part, but instead from theignored_field_values
, which suggests that the value was not indexed. That is though not true and you can actually search perfectly fine by it, since changing theignore_above
of a field does not change anything around the already indexed documents as far as I understand:This will return the same document as above, even though that result suggests that the
name
field had no indexed value, but only an ignored one. I personally found that behavior a bit confusing, since I thought one of the intentions of thefields
API was to give us a better insight into the actual "indexed" state of a document, which it does not do in this case.Especially confusing is, that this field, despite it's value not being returned as
fields
but asignored_field_values
is not listed under_ignored
(since it's not actually ignored). If you index the same document a 2nd time (after changingignore_above
and thus no actually ignoring that value when indexing), you will end up with two documents, where the first one has no_ignored
part, but still a value underignored_field_values
, (that is no actually ignored):Is this behavior intended and if so, documented somewhere?