couchbaselabs / cbft

*THIS PROJECT HAS MOVED* from couchbaselabs TO: https://github.com/couchbase/cbft -- no further development will be done here on couchbaselabs/cbft
Other
27 stars 5 forks source link

index alias results includes which original index the doc came from #170

Open steveyen opened 9 years ago

steveyen commented 9 years ago

An index alias improvement idea (raised originally by @mschoch)

The response from a search on an index alias should include which index did a search hit come from?

e.g., searching an index alias that includes index of a wiki, index of dropbox, and index of jira docs, Which doc set did this hit come from?

One application workaround idea is the JSON docs from all the datasources just includes another field that provides the original source (or bucket) of the doc. The commonly used "type" field, for example, might provide this info, but is less useful when the index alias points to indexes like "wiki1", "wiki2", "wiki3".

Nasty test case idea: beware of index alias "trees" where an index alias may points to more index aliases and how this feature might interact with alias trees.

steveyen commented 8 years ago

this commit looks related: https://github.com/blevesearch/bleve/commit/d73beac3b983964945617485275605a9ca1d5eec

mschoch commented 8 years ago

This is partially done, the request:

curl -H "Content-Type: application/json" -XPOST http://localhost:9200/api/index/bs/query -d '{"query":{"match": "water"}, "size":1}'

Now produces the result:

{
  "request": {
    "query": {
      "match": "water",
      "boost": 1,
      "prefix_length": 0,
      "fuzziness": 0
    },
    "size": 1,
    "from": 0,
    "highlight": null,
    "fields": null,
    "facets": null,
    "explain": false
  },
  "hits": [
    {
      "index": "bs_a53cc133b262d7e4_722629d9",
      "id": "water_street_brewery",
      "score": 0.8367325262553875,
      "locations": {
        "address": {
          "water": [
            {
              "pos": 3,
              "start": 11,
              "end": 16,
              "array_positions": [
                0
              ]
            }
          ]
        },
        "name": {
          "water": [
            {
              "pos": 1,
              "start": 0,
              "end": 5,
              "array_positions": null
            }
          ]
        }
      }
    }
  ],
  "total_hits": 91,
  "max_score": 0.8367325262553875,
  "took": 47208935,
  "facets": {}
}

NOTE that the hit contains "index": "bs_a53cc133b262d7e4_722629d9" which identifies the pindex from which the result came. If we created a second level of aliasing however, we would not see that in the response.