Open alexaryn opened 8 months ago
I don't think opensearch records are timestamped by default... but, weird. Do you have an example of how you loaded the index so I can repro?
this is an issue with all search response processors: "_source" is computed in the fetch phase, which occurs strictly before the execution of the response pipeline. There's not really a good way to 'request a document field' short of specifying in _source (or not including _source and effectively select *'ing)
For near-duplicate detection, the array of shingles attached to each document is necessary for the dedup processor to do its work. In reality, the shingles are an implementation detail and users of NDD may not and should not need to know that they exist. Nevertheless, if the
_source
part of the query doesn't listshingles
then the whole feature breaks down.I'm not sure this is exactly a bug, but it would be great if search processors could see "everything" about each document in order to do their work. For instance, it might be useful to have modification date, too, as one possible way of deciding which near-duplicates remain in the results.