elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.65k stars 24.65k forks source link

Highlight exceptions when field type is array #43412

Closed huazai023 closed 6 months ago

huazai023 commented 5 years ago

Elasticsearch version: 6.2.4

Plugins installed: [elasticsearch-analysis-ansj(6.2.4),elasticsearch-analysis-pinyin(6.2.4)]

JVM version: 1.8.0_172

OS version: Linux SZD-L0097945 2.6.32-696.18.7.el6.x86_64 #1 SMP Thu Jan 4 17:31:22 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem:

Highlight exceptions when field type is array.

We first encountered the issue in 6.2.3, but I just tested 6.5.4 and the problem still exists.

Steps to reproduce:

Add index:

curl -X PUT "localhost:9200/fkg_pol/" -H 'Content-Type: application/json' -d'{"index":{"analysis":{"analyzer":{"index_ansj_pinyin":{"type":"custom","tokenizer":"index_ansj","filter":"pinyin_filter"},"query_ansj_pinyin":{"type":"custom","tokenizer":"query_ansj","filter":"pinyin_filter"},"standard_pinyin":{"type":"custom","tokenizer":"standard","filter":"pinyin_filter"},"search_pinyin":{"type":"custom","tokenizer":"my_pinyin","filter":"pinyin_filter"}},"filter":{"pinyin_filter":{"type":"pinyin","keep_full_pinyin":false,"keep_joined_full_pinyin":true,"keep_none_chinese_in_joined_full_pinyin":true,"limit_first_letter_length":10,"none_chinese_pinyin_tokenize":false,"lowercase":true,"remove_duplicated_term":true}},"tokenizer":{"my_pinyin":{"type":"pinyin","keep_first_letter":false,"keep_full_pinyin":false,"keep_joined_full_pinyin":true,"limit_first_letter_length":10,"lowercase":true,"remove_duplicated_term":true}}}}}'

curl -X POST "localhost:9200/fkg_pol/fkg_pol/_mappping" -H 'Content-Type: application/json' -d'{"fkg_pol":{"properties":{"keyword":{"search_analyzer":"query_ansj","analyzer":"index_ansj","type":"text","fields":{"single":{"analyzer":"standard","type":"text","fields":{"pinyin":{"search_analyzer":"search_pinyin","analyzer":"standard_pinyin","type":"text"}}},"pinyin":{"search_analyzer":"query_ansj_pinyin","analyzer":"index_ansj_pinyin","type":"text"}}}}}}'

Add some data:

curl -X PUT "localhost:9200/fkg_pol/fkg_pol/1" -H 'Content-Type: application/json' -d'{"keyword":["排气量","销售情况","监测数据"]}'

curl -X PUT "localhost:9200/fkg_pol/fkg_pol/1" -H 'Content-Type: application/json' -d'{"keyword":["有限责任公司","大众汽车","交通工程","三轮汽车"]}'

The query statement:

curl -X POST "localhost:9200/fkg_pol/_ssearch" -H 'Content-Type: application/json' -d'{"query":{"bool":{"should":[{"match":{"keyword.pinyin":"汽车销售"}}]}},"highlight":{"require_field_match":"true","fields":{"keyword.pinyin":{}}}}'

It will fail with:

{"took":20,"timed_out":false,"_shards":{"total":5,"successful":3,"skipped":0,"failed":2,"failures":[{"shard":2,"index":"fkg_pol","node":"OK9m4NgIR2u9GOAvykMPkw","reason":{"type":"illegal_state_exception","reason":"last() should not be called in this context"}},{"shard":3,"index":"fkg_pol","node":"xiJgdXa9TSyz6DBxE9DdlQ","reason":{"type":"illegal_argument_exception","reason":"offset out of bounds"}}]},"hits":{"total":2,"max_score":0.53296894,"hits":[]}}

elasticmachine commented 5 years ago

Pinging @elastic/es-search

JackYangzg commented 4 years ago

I meet the same problem

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "offset out of bounds"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "fetch",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "xxxxxxxxxxxxxxxxxxxxx",
        "node" : "xxxxxxxxxxxxxxxxxxxxxx",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "offset out of bounds"
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "offset out of bounds",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "offset out of bounds"
      }
    }
  },
  "status" : 400
}
huazai023 commented 4 years ago

I've avoided this problem in other ways, and elastic can fix this bug as soon as possible

---Original--- From: "yzg"<notifications@github.com> Date: Tue, Jul 7, 2020 08:59 AM To: "elastic/elasticsearch"<elasticsearch@noreply.github.com>; Cc: "huazai023"<541597783@qq.com>;"Author"<author@noreply.github.com>; Subject: Re: [elastic/elasticsearch] Highlight exceptions when field type is array (#43412)

I meet the same problem { "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "offset out of bounds" } ], "type" : "search_phase_execution_exception", "reason" : "all shards failed", "phase" : "fetch", "grouped" : true, "failed_shards" : [ { "shard" : 0, "index" : "icenter-contentsfrom", "node" : "QeC7iMGtTPy0IyuhCKjmuA", "reason" : { "type" : "illegal_argument_exception", "reason" : "offset out of bounds" } } ], "caused_by" : { "type" : "illegal_argument_exception", "reason" : "offset out of bounds", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "offset out of bounds" } } }, "status" : 400 }
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

cbuescher commented 3 years ago

Since I think this issue is depending on the two installed language plugins, I did some reproduction attempts on 7.9.3 where the analysis-ansj is still available. Newer versions don't seem to be supported yet.

To speed up future reproduction attempts here's the two plugins used:

./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-analysis-ansj/releases/download/v7.9.3/elasticsearch-analysis-ansj-7.9.3.0-release.zip

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.9.3/elasticsearch-analysis-pinyin-7.9.3.zip

The issue reproduces with only one document as well:

Snippet ``` DELETE fkg_pol PUT /fkg_pol { "settings": { "index": { "analysis": { "analyzer": { "index_ansj_pinyin": { "type": "custom", "tokenizer": "index_ansj", "filter": "pinyin_filter" }, "query_ansj_pinyin": { "type": "custom", "tokenizer": "query_ansj", "filter": "pinyin_filter" }, "standard_pinyin": { "type": "custom", "tokenizer": "standard", "filter": "pinyin_filter" }, "search_pinyin": { "type": "custom", "tokenizer": "my_pinyin", "filter": "pinyin_filter" } }, "filter": { "pinyin_filter": { "type": "pinyin", "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_none_chinese_in_joined_full_pinyin": true, "limit_first_letter_length": 10, "none_chinese_pinyin_tokenize": false, "lowercase": true, "remove_duplicated_term": true } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "keep_first_letter": false, "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "limit_first_letter_length": 10, "lowercase": true, "remove_duplicated_term": true } } } } }, "mappings": { "properties": { "keyword": { "search_analyzer": "query_ansj", "analyzer": "index_ansj", "type": "text", "fields": { "single": { "analyzer": "standard", "type": "text", "fields": { "pinyin": { "search_analyzer": "search_pinyin", "analyzer": "standard_pinyin", "type": "text" } } }, "pinyin": { "search_analyzer": "query_ansj_pinyin", "analyzer": "index_ansj_pinyin", "type": "text" } } } } } } PUT /fkg_pol/_doc/1 {"keyword":["排气量","销售情况","监测数据"]} POST /fkg_pol/_search {"query":{"bool":{"should":[{"match":{"keyword.pinyin":"汽车销售"}}]}},"highlight":{"require_field_match":"true","fields":{"keyword.pinyin":{}}}} ```

Some details from the stacktrace I see on the ES side:

Caused by: java.lang.IllegalStateException: last() should not be called in this context
    at org.apache.lucene.search.uhighlight.BoundedBreakIteratorScanner.last(BoundedBreakIteratorScanner.java:175) ~[elasticsearch-7.9.3.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.apache.lucene.search.uhighlight.SplittingBreakIterator.preceding(SplittingBreakIterator.java:218) ~[lucene-highlighter-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.apache.lucene.search.uhighlight.CustomFieldHighlighter.highlightOffsetsEnums(CustomFieldHighlighter.java:126) ~[elasticsearch-7.9.3.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:79) ~[lucene-highlighter-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:641) ~[lucene-highlighter-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.apache.lucene.search.uhighlight.CustomUnifiedHighlighter.highlightField(CustomUnifiedHighlighter.java:101) ~[elasticsearch-7.9.3.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:38]
    at org.elasticsearch.search.fetch.subphase.highlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:131) ~[elasticsearch-7.9.3.jar:7.9.3]
    at org.elasticsearch.search.fetch.subphase.highlight.HighlightPhase.hitExecute(HighlightPhase.java:117) ~[elasticsearch-7.9.3.jar:7.9.3]
    at org.elasticsearch.search.fetch.subphase.highlight.HighlightPhase.hitExecute(HighlightPhase.java:50) ~[elasticsearch-7.9.3.jar:7.9.3]
    at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:180) ~[elasticsearch-7.9.3.jar:7.9.3]

Interestingly, when swapping the order of the three entries in the document, the search seems to work:

PUT /fkg_pol/_doc/1
{"keyword":["监测数据","排气量","销售情况"]}

and the same search returns:

"hits" : [
      {
        "_index" : "fkg_pol",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.29171452,
        "_source" : {
          "keyword" : [
            "监测数据",
            "排气量",
            "销售情况"
          ]
        },
        "highlight" : {
          "keyword.pinyin" : [
            "<em>销售</em>情况"
          ]
        }
      }
    ]

As I don't understand the content of the document I don't know if the results and the highlighting is okay, but it is strange that the order of entries in the field influence wether the outcome is an error or not.

mayya-sharipova commented 6 months ago

Closing this in favour of https://github.com/elastic/elasticsearch/issues/91495

Gandalf-z commented 2 months ago

我已经通过其他方式避免了这个问题,并且elastic可以尽快修复这个bug ---原创--- 来自:“yzg”notifications@github.com 日期:2020 年 7 月 7 日星期二上午 8:59 收件人:“elastic/elasticsearch”elasticsearch@noreply.github.com; 抄送:“huazai023”541597783@qq.com;“作者”author@noreply.github.com;主题:回复:[elastic/elasticsearch] 当字段类型为数组时突出显示异常(#43412) 我遇到了同样的问题 { “error” : { “root_cause” : [ { “type” : “illegal_argument_exception”, “reason” : “offset out of bounds” } ], “type” : “search_phase_execution_exception”, “reason” : “all shards failed”, “phase” : “fetch”, “grouped” : true, “failed_shards” : [ { “shard” : 0, “index” : “icenter-contentsfrom”, “node” : “QeC7iMGtTPy0IyuhCKjmuA”, “reason” : { “type” : “illegal_argument_exception”, “reason” : “offset out of bounds” } } ], “caused_by” : { "type" : "illegal_argument_exception", "reason" : "offset out of bounds", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "offset out of bounds" } } }, "status" : 400 } — 您收到此邮件是因为您是该主题的作者。直接回复此电子邮件、在 GitHub 上查看或取消订阅。

Can you share how you solved it? Thanks