dyweb / blog

Dongyue Tech Blog
https://blog.dongyueweb.com
14 stars 7 forks source link

[post] How Elasticsearch uses Lucene's index time join to handle nested objects #58

Closed at15 closed 1 year ago

at15 commented 1 year ago

Type

Related

None so far

Description

When indexing object with nested array of objects, the default behaivor in ES is often not expected because it flatten the data. For example, when searching from tshirts https://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

{
   name: dyweb
   specs: [
      { size: xl, color: blue},
      { size: xxl, color: red}
   ]
}

If you index it directly in ES, it will flatten into:

{
   name: dyweb
   specs.size: [xl, xxl]
   specs.color: [blue, red]
}

The flatten form will match query such as name = dyweb & size = xl & color = red, which is actually invalid, because the xl shirt is blue while the red shirt is xxl ...

The solution in ES is using nested https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html Under the hood it is using lucene's join module, which provides both index time (using a block) and query time join.

If time is allowed I plan to cover

Update

at15 commented 1 year ago

Fixed in #59 I kind of forgot how the website is deployed though ... are we using gh or netlify ... I didn't see it in the netlify account ... @gaocegege

at15 commented 1 year ago

https://blog.dongyueweb.com/how_elasticsearch_uses_lucene_index_time_join_to_handle_nested_objects.html