elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.53k stars 24.9k forks source link

Puzzled about highlighter error: String index out of range #23237

Closed imranazad closed 7 years ago

imranazad commented 7 years ago

According to this the issue with the FVH has been fixed: https://issues.apache.org/jira/browse/LUCENE-4899

However I've downloaded Elastic 5.2.1 but I'm still getting the same error.

Can anyone shed light on this? Has this issue been fixed yet?

jasontedor commented 7 years ago

Please provide steps to reproduce and more details such as log messages. Maybe you're encountering the same bug, maybe not but we have no way of deducing that without more information.

clintongormley commented 7 years ago

Duplicate of https://github.com/elastic/elasticsearch/issues/22997

imranazad commented 7 years ago

Steps to reproduce:

PUT /test
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "english_stemmer": {
            "type": "stemmer",
            "name": "minimal_english"
          },
          "word_delimiter": {
            "type": "word_delimiter",
            "catenate_words": "true",
            "generate_number_parts": "true",
            "generate_word_parts": "true",
            "split_on_numerics": "false"
          }
        },
        "analyzer": {
          "fulltext": {
            "type": "custom",
            "filter": [
              "word_delimiter",
              "english_stemmer"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "1",
      "number_of_shards": "1"
    }
  },
 "mappings": {
    "document": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "fulltext"
        },
        "content": {
          "type": "text",
          "term_vector": "with_positions_offsets",
          "analyzer": "fulltext",
          "search_analyzer": "fulltext"
        },
        "metadescription": {
          "type": "text", 
                    "index": "not_analyzed",
          "copy_to": "content"
        }       
      }
    }
  }
}
POST /test/document/1
{"title": "IPG53: Computed tomography-guided thermocoagulation of osteoid osteoma", "content": "IPG53 <p><em><!--   <label>Interventional procedures, IPG53 - Issued: March 2004</label> --></em></p>\n<p>The National Institute for Health and Clinical Excellence (NICE) has issued full guidance to the NHS in England, Wales, Scotland and Northern Ireland on computed tomography-guided thermocoagulation of osteoid osteoma.</p>\n<h2>Description</h2>\n<p>This procedure can be performed under intravenous sedation or general anesthesia and involves the use of CT guidance.</p>\n<p>Step one is to localise the lesion with CT. A trephine bone biopsy needle is then introduced into the lesion. The needle (or sometimes a drill) is then used to create an entry hole through the bone. CT is used to monitor the progress of the needle to ensure placement near the tumour.</p>\n<p>The core of the lesion is then removed with the inner trephine needle for biopsy, and a radiofrequency electrode probe is introduced into the centre of the nidus. The probe is then heated to 85-90 degrees&nbsp;centrigrade for 4-6 minutes. The whole procedure takes around 90 minutes. After removal of the electrode, patients are scanned by CT to assess the outcome of the procedure.</p>\n<p><a href=\"http://www.nice.org.uk/about/what-we-do/our-programmes/nice-guidance/nice-interventional-procedures-guidance/coding-recommendations\">Coding</a> and <a href=\"http://www.nice.org.uk/guidance/IPG53/resources\">clinical classification codes</a> for this guidance</p> <div class=\"book\" title=\"Computed tomography-guided thermocoagulation of osteoid osteoma\" xmlns=\"http://www.w3.org/1999/xhtml\">\n  <h1 class=\"title\">\n    <a id=\"ID0EC\"></a>Computed tomography-guided thermocoagulation of osteoid osteoma</h1>\n  <div class=\"chapter\" title=\"1 Guidance\">\n    <h2 class=\"title\">\n      <a id=\"guidance\"></a>1 Guidance</h2>\n    <p class=\"numbered-paragraph\">\n      <span class=\"paragraph-number\">1.1 </span>Current evidence on the safety and efficacy of computed tomography (CT)-guided thermocoagulation of osteoid osteoma appears adequate to support its use, provided that the normal arrangements are in place for consent, audit and clinical governance. </p>\n  </div>\n  <div class=\"chapter\" title=\"2 The procedure\">\n    <h2 class=\"title\">\n      <a id=\"the-procedure\"></a>2 The procedure</h2>\n    <div class=\"section\" title=\"2.1 Indications\">\n      <h3 class=\"title\">\n        <a id=\"indications\"></a>2.1 Indications</h3>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.1.1 </span>Osteoid osteomas are benign, bone-forming tumours that occur most frequently in the legs, especially in the femur and tibia. </p>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.1.2 </span>Almost all patients have pain as a result of the tumour. Other symptoms include growth disturbances, bony deformity, scoliosis and, if located within a joint, swelling, synovitis, restricted movement and contracture. This condition may regress spontaneously, but the resolution of symptoms is unpredictable and may take months or years. </p>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.1.3 </span>Standard treatment initially focuses on pain management using non-steroidal anti-inflammatory drugs. Patients who continue to have pain or who experience other tumour-related complications are offered surgical excision. Surgery requires a hospital stay of several days and the patient cannot undertake weight-bearing activity for a substantial period of time. Aggressive resection carries the risk of postoperative fracture, infection and haematoma. </p>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.1.4 </span>In recent years several minimally invasive techniques using imaging, such as percutaneous resection and radiofrequency ablation, have been introduced in patients with osteoid osteoma in order to achieve removal or destruction of the tumour without the subsequent morbidity of standard surgical treatment.</p>\n    </div>\n    <div class=\"section\" title=\"2.2 Outline of the procedure\">\n      <h3 class=\"title\">\n        <a id=\"outline-of-the-procedure\"></a>2.2 Outline of the procedure</h3>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.2.1 </span>In this procedure, the lesion is located using computed tomography (CT). Under general anaesthetic, an entry hole is created through the bone using a fine drill. A radiofrequency electrode probe is introduced into the centre of the osteoma and heated. The electrode is then removed and a CT scan is done later to assess the outcome of the procedure</p>\n    </div>\n    <div class=\"section\" title=\"2.3 Efficacy\">\n      <h3 class=\"title\">\n        <a id=\"efficacy\"></a>2.3 Efficacy</h3>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.3.1 </span>Resolution of pain was the main outcome reported in the studies. In a case series of 97 consecutive patients with a mean follow up of 41 months, 76% (74/97) of patients reported a good response after one treatment session and 92% (89/97) reported a good response after one or two sessions. In the smaller studies, resolution of symptoms was reported by 82–95% of patients at final follow up. For more details, refer to the Sources of evidence section.</p>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.3.2 </span>The Specialist Advisors considered that this was an established procedure with no concerns or uncertainties about its efficacy. One Advisor stated that the procedure was better than open surgery as there is less risk of recurrence.</p>\n    </div>\n    <div class=\"section\" title=\"2.4 Safety\">\n      <h3 class=\"title\">\n        <a id=\"safety\"></a>2.4 Safety</h3>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.4.1 </span>Few complications were observed in the studies. Five out of 239 patients (2%) experienced complications, including three who experienced superficial burns. For more details, refer to the Sources of evidence section.</p>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.4.2 </span>The Specialist Advisors noted transient pain as the most common complication of the procedure. Infection was also listed, but described as a rare adverse event. It was noted that if the tumour is in a difficult area, adjacent structures may be at risk from inappropriate positioning of the electrode, but Advisors commented that the procedure is still safer than surgery in similar situations.  </p>\n    </div>\n    <div class=\"section\" title=\"2.5 Other comments\">\n      <h3 class=\"title\">\n        <a id=\"other-comments\"></a>2.5 Other comments</h3>\n      <p class=\"numbered-paragraph\">\n        <span class=\"paragraph-number\">2.5.1 </span>Particular care is required in selecting and treating patients with osteoid osteoma in the spine because of the proximity of nerve roots and the potential risk of neurological complications.</p>\n      <p>Andrew Dillon<br />Chief Executive<br />March 2004</p>\n    </div>\n  </div>\n  <div class=\"chapter\" title=\"3 Further information\">\n    <h2 class=\"title\">\n      <a id=\"further-information\"></a>3 Further information</h2>\n    <div class=\"section\" title=\"Sources of evidence\">\n      <h3 class=\"title\">\n        <a id=\"sources-of-evidence\"></a>Sources of evidence</h3>\n      <p>The evidence considered by the Interventional Procedures Advisory Committee is described in the following document.</p>\n      <p>\n        <a class=\"link\" href=\"http://www.nice.org.uk/proxy/?sourceUrl=http%3a%2f%2fwww.nice.org.uk%2fIP221overview\" target=\"_top\" data-original-url=\"http://www.nice.org.uk/IP221overview\">'Interventional procedure overview of CT-guided thermocoagulation of osteoid osteoma'</a>, May 2003.</p>\n    </div>\n    <div class=\"section\" title=\"Information for patients\">\n      <h3 class=\"title\">\n        <a id=\"information-for-patients\"></a>Information for patients</h3>\n      <p>NICE has produced <a class=\"link\" href=\"http://www.nice.org.uk/guidance/ipg53/informationforpublic\" target=\"_top\" data-original-url=\"http://www.nice.org.uk/guidance/IPG53/PublicInfo/pdf/English\">information on this procedure for patients and carers</a>. It explains the nature of the procedure and the guidance issued by NICE, and has been written with patient consent in mind. </p>\n    </div>\n  </div>\n  <div class=\"chapter\" title=\"4 About this guidance\">\n    <h2 class=\"title\">\n      <a id=\"about-this-guidance\"></a>4 About this guidance</h2>\n    <p>NICE interventional procedure guidance makes recommendations on the safety and efficacy of the procedure. It does not cover whether or not the NHS should fund a procedure. Funding decisions are taken by local NHS bodies after considering the clinical effectiveness of the procedure and whether it represents value for money for the NHS. It is for healthcare professionals and people using the NHS in England, Wales, Scotland and Northern Ireland, and is endorsed by Healthcare Improvement Scotland for implementation by NHSScotland.</p>\n    <p>This guidance was developed using the NICE <a class=\"link\" href=\"http://www.nice.org.uk/about/what-we-do/our-programmes/nice-guidance/nice-interventional-procedures-guidance\" target=\"_top\" data-original-url=\"http://www.nice.org.uk/aboutnice/howwework/developingniceinterventionalprocedures/developing_nice_interventional_procedures.jsp\">interventional procedure guidance</a> process.</p>\n    <p>We have produced a <a class=\"link\" href=\"http://www.nice.org.uk/guidance/ipg53/informationforpublic\" target=\"_top\" data-original-url=\"http://guidance.nice.org.uk/IPG53/PublicInfo/pdf/English\">summary of this guidance for patients and carers</a>. Information about the evidence it is based on is also <a class=\"link\" href=\"http://www.nice.org.uk/guidance/ipg53\" target=\"_top\" data-original-url=\"http://guidance.nice.org.uk/IPG53\">available</a>. </p>\n    <p>\n      <strong>Changes since publication</strong>\n    </p>\n    <p>28 January 2012: minor maintenance.</p>\n    <p>\n      <strong>Your responsibility</strong>\n    </p>\n    <p>This guidance represents the views of NICE and was arrived at after careful consideration of the available evidence. Healthcare professionals are expected to take it fully into account when exercising their clinical judgement. This guidance does not, however, override the individual responsibility of healthcare professionals to make appropriate decisions in the circumstances of the individual patient, in consultation with the patient and/or guardian or carer.</p>\n    <p>Implementation of this guidance is the responsibility of local commissioners and/or providers. Commissioners and providers are reminded that it is their responsibility to implement the guidance, in their local context, in light of their duties to avoid unlawful discrimination and to have regard to promoting equality of opportunity. Nothing in this guidance should be interpreted in a way which would be inconsistent with compliance with those duties. </p>\n    <p>\n      <strong>Copyright</strong>\n    </p>\n    <p>© National Institute for Health and Clinical Excellence 2004.<a id=\"_GoBack\"></a> All rights reserved. NICE copyright material can be downloaded for private research and study, and may be reproduced for educational and not-for-profit purposes. No reproduction by or for commercial organisations, or for commercial purposes, is allowed without the written permission of NICE.</p>\n    <p>\n      <strong>Contact NICE</strong>\n    </p>\n    <p>National Institute for Health and Clinical Excellence<br />Level 1A, City Tower, Piccadilly Plaza, Manchester M1 4BT<br /></p>\n    <p>\n      <a class=\"link\" href=\"http://www.nice.org.uk/\" target=\"_top\" data-original-url=\"http://www.nice.org.uk/\">www.nice.org.uk</a>\n      <br />\n      <a class=\"link\" href=\"mailto:nice@nice.org.uk\" target=\"_top\">nice@nice.org.uk</a>\n      <br />0845 033 7780</p>\n  </div>\n</div>", "metadescription": "Evidence-based recommendations on CT-guided thermocoagulation for osteoid osteoma (non-cancerous tumours/growths of bone tissue)"}
GET /test/_search
{
  "from": 0,
  "size": 10, 
  "highlight": {
    "order": "score",
    "fields": {
      "content": {
        "pre_tags": [
          "<mark>"
        ],
        "post_tags": [
          "<\/mark>"
        ],
        "fragment_size": 150,
        "number_of_fragments": 1,
        "matched_fields": [
          "content"
        ]
      }
    }
  },
  "query": {
    "query_string": {
      "query": "(cancerous tumour)",
      "fields": [
        "content",
        "title"
      ]
    }
  }
}

It seems that the problem occurs when both word_delimiter and english_stemmer are used together.

imranazad commented 7 years ago

I just noticed in another issue that the unified highlighter resolves this issue. How do I actually use the unified highlighter? Is it included in the latest downloadable version of ES? I tried specifying the type as unified but I get back unknown highlighter type [unified] for the field [content]

clintongormley commented 7 years ago

The unified highlighter is coming in 5.3.0. Your request works with the unified highlighter, and also with the plain highlighter.

imranazad commented 7 years ago

@clintongormley Ah I see, thanks for the prompt response.