elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.92k stars 24.73k forks source link

Completion Suggester not showing results if value is already in previous field #82432

Open b32196 opened 2 years ago

b32196 commented 2 years ago

Elasticsearch version (bin/elasticsearch --version): 6.7, 7.9.2, 7.16.2

Plugins installed: [none]

JVM version (java -version): 1.8

OS version (uname -a if on a Unix-like system): docker, ubuntu 18.04

I experience a problem with the completion suggester in Elasticsearch 7.9.2 (also present in 6.7 and 7.16.2)

If I have more than one suggest input with the same value for different contexts it will only find a completion for the first one, not the second.

The suggest Part of the Document looks like this.

"SUGGEST": [ { "input": [ "Inzidenz", "Notbremse", "Kraft", "Mallorca" ], "contexts": { "FIELD": [ "DESKT", "DESKT_ABST" ] } }, { "input": [ "Pandemie", "Notbremse", "Kraft" ], "contexts": { "FIELD": [ "TIT", "TITSPLABST", "TITSP", "TITTXTSP", "TIT_RHTI" ] } } ]

Both contein the input "Notbremse" But if I search for "Notbre" with the contexts FIELD "TITSPLABST" I get no results. But if I search for "Notbre" with the contexts FIELD "DESKT" i get the expected result. It depends on the order of the input fields in the document. if i change the order that the input field for "TITSPLABST" is before the other then it works.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including (e.g.) index creation, mappings, settings, query etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. create Index

`PUT /suggest_test?pretty { "settings" : { "index.mapping.nested_fields.limit" : 60,

    "number_of_replicas" : 1,
    "number_of_shards" : 6,

    "index" : {
        "analysis" : {
            "char_filter" : {
                "umlaut_filter" : {
                    "type" : "mapping",
                    "mappings" : ["\u00c4=>ae", "\u00e4=>ae", "\u00d6=>oe", "\u00f6=>oe", "\u00dc=>ue", "\u00fc=>ue", "\u00df=>ss", "\u00c6=>Ae", "\u00e6=>ae", "\u00d8=>Oe", "\u00f8=>oe"]
                },
                "underscore_pattern" : {
                    "type" : "pattern_replace",
                    "pattern" : "\\_",
                    "replacement" : ""
                }
            },
            "tokenizer" : {
                "sd_tokenizer" : {
                    "type" : "pattern",
                    "pattern" : "[^\\p{N}\\p{L}#]+",
                    "group" : "-1"
                }
            },
            "filter" : {
                "prefixFilter" : {
                           "type" : "pattern_capture",
                           "preserve_original" : true,
                           "patterns" : [
                              "([a-zA-Z]+[^#])$"
                           ]
                        }     
            },
            "analyzer" : {
                 "default": {
                    "type" : "custom",
                    "tokenizer" : "sd_tokenizer",
                    "char_filter" : ["umlaut_filter", "underscore_pattern"],
                    "filter" : ["lowercase", "asciifolding"]
                },
                "prefixAnalyzer": {
                    "tokenizer" : "sd_tokenizer",
                    "char_filter" : ["umlaut_filter", "underscore_pattern"],
                    "filter" : ["prefixFilter","lowercase", "asciifolding"]
                },
                "same_analyzer": {
                    "type" : "custom",
                    "tokenizer" : "keyword",
                    "char_filter" : ["umlaut_filter", "underscore_pattern"],
                    "filter" : ["lowercase", "asciifolding"]
                },

                "online_rechte_analyzer": {
                    "type" : "custom",
                    "tokenizer" : "sd_tokenizer",
                    "filter": ["lowercase"]
                }
            }
        }
    }
},
"mappings": {
        "_source" : {
            "enabled" : true
        },
        "dynamic" : "strict",
        "properties" : {
            "AK" : {
                "properties" : {
                    "TITLE" : {
                        "type" : "text"
                    },
                    "TEXT" : {
                        "type" : "text"
                    }
            }
        },
        "SUGGEST": {
                "type": "completion",
                "analyzer": "default",
                "preserve_separators": true,
                "preserve_position_increments": true,
                "max_input_length": 25,
                "contexts": [
                    {
                        "name": "FIELD",
                        "type": "category"
                    }
                ]
        }  
    }
}

}`

  1. Put Document into index

POST /suggest_test/_doc?refresh=true { "AK": [ { "TITLE": "Die Notbremse tritt in Kraft", "TEXT": [ "Inzidenz von 99,9, ab 100 soll die vereinbarte 'Notbremse' in Kraft treten.\n\n(O-Ton) Angela Merkel: \"Werden leider von der Notbremse Gebrauch machen muessen\" \n(O-Ton) Michael Kretschmer: Schnell umsteuern \n(O-Ton) Malu Dreyer: Oeffnung Aussengastronomie bei Kontaktbeeschraenkungen \n(O-Ton) Manuela Schwesig: Contro Reisen nach Mallorca, aber nicht im eigenen Bundesland \n(O-Ton) Angela Merkel: \"Devise lautet: Impfen, impfen, impfen \n(O-Ton) Ulrich Weigelt: Die Menschen moechten beim Hausarzt geimpft werden \n(O-Ton) Manuela Schwesig: (Impfstoff Sputnik V) Impfstoff muss sicher sein, Herkunft darf nicht entscheidend sein" ] } ], "SUGGEST": [ { "input": [ "Inzidenz", "Notbremse", "Kraft", "Mallorca" ], "contexts": { "FIELD": [ "DESKT", "DESKT_ABST" ] } }, { "input": [ "Pandemie", "Notbremse", "Kraft" ], "contexts": { "FIELD": [ "TIT", "TITSPLABST", "TITSP", "TITTXTSP", "TIT_RHTI" ] } } ] }

  1. Search for completion of "Notbrem" in contexts FIELD TITSPLABST (TITSPLABST is just a name from the Java context) This search produces no result

POST suggest_test/_search?pretty { "suggest": { "hfdb-suggest" : { "prefix" : "Notbrem", "completion" : { "field" : "SUGGEST" ,"skip_duplicates":false,"contexts":{ "FIELD":[{"context":"TITSPLABST","boost":1}] } } } } }

  1. Now search for "Notbrem" in contexts FFIELD DESKT This search produces a result

POST suggest_test/_search?pretty { "suggest": { "hfdb-suggest" : { "prefix" : "Notbrem", "completion" : { "field" : "SUGGEST" ,"skip_duplicates":false,"contexts":{ "FIELD":[{"context":"DESKT","boost":1}] } } } } }

I expect a result in both queries, but only got one in the second.

Provide logs (if relevant):

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)