elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.42k stars 24.57k forks source link

FVH + hunspell breaks when searching multiple fields #18692

Closed rpedela closed 7 years ago

rpedela commented 8 years ago

Elasticsearch version: 2.3.2 JVM version: 1.8.0_91 OS version: Ubuntu 14.04 Hunspell dictionary: http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice

When using the fast vector highlighter and the hunspell stemmer, the highlights for a given field change depending on which fields are searched. If I disable stemming, use kstem, or use the plain highlighter then everything works as expected. It appears to be some weird interaction between FVH, hunspell, and my particular search query.

Index:

curl -XPUT 'http://localhost:9200/highlight_test' -d '{
    "settings": {
        "analysis": {
            "filter": {
                "en_US": {
                    "type": "hunspell",
                    "language": "en_US"
                }
            },
            "analyzer": {
                "en_US": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "en_US"
                    ]
                }
            }
        }
    },
    "mappings": {
        "default": {
            "_all": {
                "enabled": false
            },
            "properties": {
                "raw_text": {
                    "analyzer": "en_US",
                    "type": "string",
                    "term_vector": "with_positions_offsets"
                },
                "title": {
                    "analyzer": "en_US",
                    "type": "string"
                }
            }
        }
    }
}'

curl -XPUT 'http://localhost:9200/highlight_test/default/1' -d '{
    "raw_text": "EX-99.2\n10\nf8k072915ex99ii_globalpart.htm\nGLOBAL PARTNER ACQUISITION CORP. ANNOUNCES CLOSING OF INITIAL PUBLIC OFFERING\n\n\n\nExhibit 99.2\n\n\n\n\n\n\n\n\nGlobal Partner\nAcquisition Corp. Announces Closing of Initial Public Offering\n\n\n\n\n\nNEW YORK, August 4, 2015 /PRNewswire/\n-- Global Partner Acquisition Corporation (NASDAQ:GPACU) (the \"Company\") announced today that it closed its initial\npublic offering of 15,525,000 units, including 2,025,000 units issued pursuant to the full exercise by the underwriters of their\nover-allotment option. The offering was priced at $10.00 per unit, resulting in gross proceeds of $155,250,000. The Company is\na newly organized blank check company formed for the purpose of effecting a merger or other business combination with a target\ncompany. The proceeds of the offering will be used to fund such business combination.\n\n\n\n\n\nThe Companys units began trading\non the NASDAQ Capital Market under the ticker symbol “GPACU” on July 30, 2015. Each unit consists of one share of the\nCompanys common stock and one warrant. Each warrant will entitle the holder thereof to purchase one-half of one share of the Companys\ncommon stock at $5.75 per half share. Once the securities comprising the units begin separate trading, the common stock and warrants\nare expected to be listed on the NASDAQ Stock Market under the ticker symbols “GPAC” and “GPACW,” respectively.\n\n\n\n\n\nDeutsche Bank Securities Inc. acted\nas sole book-running manager for the offering.\n\n\n\n\n\nThe offering is being made only\nby means of a prospectus, copies of which may be obtained from Deutsche Bank Securities Inc., 60 Wall Street, New York, NY 10005-2836,\nAttention: Prospectus Group, Telephone: (800) 503-4611, Email: prospectus.cpdg@db.com.\n\n\n\n\n\nA registration statement relating\nto these securities has been filed with, and declared effective by, the Securities and Exchange Commission on July 29, 2015.\n\n\n\n\n\nThis press release shall not constitute\nan offer to sell or the solicitation of an offer to buy, nor shall there be any sale of these securities in any state or jurisdiction\nin which such an offer, solicitation or sale would be unlawful prior to registration or qualification under the securities laws\nof any such state or jurisdiction.\n\n\n\n\n\nFor more information, please contact: pzepf@globalpartnerac.com.\n\n\n\n\n\nFORWARD-LOOKING STATEMENTS\n\n\n\n\n\nThis press release contains statements\nthat constitute “forward-looking statements,” including with respect to the anticipated use of the net proceeds. No\nassurance can be given that the net proceeds of the offering will be used as indicated. Forward-looking statements are subject\nto numerous conditions, many of which are beyond the control of the Company, including those set forth in the Risk Factors section\nof the Companys registration statement and preliminary prospectus for the offering filed with the Securities and Exchange Commission\n(“SEC”). Copies are available on the SECs website, www.sec.gov. The Company undertakes no obligation to update\nthese statements for revisions or changes after the date of this release, except as required by law.\n\n\n\n\n\nContact:\n\n\nPaul Zepf\n\n\nChief Executive Officer\n\n\nGlobal Partner Acquisition Corporation\n\n\npzepf@globalpartnerac.com",
    "title": "EX-99.2 - Global Partner Acquisition Corp. Announces Closing Of Initial Public Offering"
}'

Search:

curl -XGET 'http://localhost:9200/highlight_test/default/_search' -d '{
    "fields": [],
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "initial public offering",
                    "type": "best_fields",
                    "operator": "and",
                    "fields": [
                        "raw_text",
                        "title"
                    ]
                }
            },
            "should": [
                {
                    "multi_match": {
                        "query": "initial public offering",
                        "type": "phrase",
                        "boost": 100,
                        "fields": [
                            "raw_text",
                            "title"
                        ]
                    }
                },
                {
                    "multi_match": {
                        "query": "initial public offering",
                        "type": "phrase",
                        "slop": 10,
                        "boost": 10,
                        "fields": [
                            "raw_text",
                            "title"
                        ]
                    }
                }
            ]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "raw_text": {
                "number_of_fragments": 1
            }
        }
    }
}'

Expected Result: If I only search raw_text in the above query rather than both title and raw_text, then I will receive a great highlight.

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.09119049,
        "hits": [
            {
                "_index": "highlight_test",
                "_type": "default",
                "_id": "1",
                "_score": 0.09119049,
                "highlight": {
                    "raw_text": [
                        "ACQUISITION CORP. ANNOUNCES CLOSING OF <em>INITIAL PUBLIC OFFERING</em>\n\n\n\nExhibit 99.2\n\n\n\n\n\n\n\n\nGlobal Partner"
                    ]
                }
            }
        ]
    }
}

Actual Result: Notice that the highlight for raw_text is different and worse than the expected result.

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.33662215,
        "hits": [
            {
                "_index": "highlight_test",
                "_type": "default",
                "_id": "1",
                "_score": 0.33662215,
                "highlight": {
                    "raw_text": [
                        "as sole book-running manager for the <em>offering</em>.\n\n\n\n\n\nThe <em>offering</em> is being made only\nby means of a prospectus"
                    ]
                }
            }
        ]
    }
}
clintongormley commented 7 years ago

Closing in favour of #21621

rpedela commented 7 years ago

@clintongormley How does integrating the UnifiedHighlighter fix a bug with FVH?

clintongormley commented 7 years ago

@rpedela by replacing the FVH with the UH