elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.43k stars 24.57k forks source link

Reenact API #22246

Closed bjorn-ali-goransson closed 7 years ago

bjorn-ali-goransson commented 7 years ago

With the _analyze API, we can send in text for analysis using either custom parameters or index analyzers (etc).

But when troubleshooting issues such as "When doing a prefix query (for multiple fields), some documents appear only when typing a part of the query, then disappear", it would be beneficial to see a complete document as indexed by ES.

Here, it would be nice to be able to "reenact" an indexing of a document using the index settings, for example by issuing a POST /my_index/my_content/1/_reenact to get a post construction of what would be indexed in terms of all field, custom fields, aggregated fields and multiple fields that map to a single property but with different analyzers.

As this is from extremely difficult to impossible to reverse discover, an enactment would be second-to-best.

Example document:

GET /my_index/my_content/1

{
  "name": "Jones",
  "description": "A big guy"
}

Example reenact request:

POST /my_index/my_content/1/_reenact

Response:

{
  "_all": {
    "jones",
    "a",
    "big",
    "guy"
  },
  "_all_swedish_analyzer": {
    "jon",
    "a",
    "big",
    "guy"
  },
  "name.with_standard_analyzer": [{
    "token": "jones",
    "start_offset": 0,
    "end_offset": 5,
    "type": "<ALPHANUM>",
    "position": 0
  }],
  "name.with_custom_swedish_analyzer": [{
    "token": "jon",
    "start_offset": 0,
    "end_offset": 5,
    "type": "<ALPHANUM>",
    "position": 0
  }],
  ... etc ...

That would be with two custom fields for name, one called with_standard_analyzer and another with_custom_swedish_analyzer, and for the swedish one a copy_to setting to the field _all_swedish_analyzer.

And this would of course be with the disclaimer that this is a REENACTMENT, not the actual, canonical content as actually was indexed by ES/Lucene.

bjorn-ali-goransson commented 7 years ago

Oh, and this would only work if the _source is still there.

jpountz commented 7 years ago

This looks similar to the term vectors API with artificial documents? https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

clintongormley commented 7 years ago

and the term vectors API can also be used for getting back the terms of an existing document. So I think we're covered