elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
993 stars 24.82k forks source link

Doc values support for geo shapes. #37206

Closed imotov closed 4 years ago

imotov commented 5 years ago

There are several features that we would like to add that require fast retrieval of geo shapes and can greatly benefit from having docvalues support for geo shapes.

This issue has gone through a lot of evolution, so this description is edited to reflect the latest reality

The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:

This is achieved by implementing by migrating geo_shape mapper registration to a module (https://github.com/elastic/elasticsearch/pull/53562) and then defining the geo_shape with doc values in the spatial plugin (https://github.com/elastic/elasticsearch/pull/55037)

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo

babadofar commented 5 years ago

What is the status on this? I really need this for my maps. Anything I could help you out with? the @elastic/es-analytics-geo repo is not public, I guess.

polyfractal commented 5 years ago

No concrete status update right now. We're still discussing how to go about encoding shapes into doc values (there are several limitations/considerations that we need to work around). When we have some news moving forward -- a plan or something similar -- we'll make an update here :)

the elastic/es-analytics-geo repo is not public, I guess.

The @elastic/es-analytics-geo team is just an alias that we use to ping internal email lists, so that different sub-teams can get notified about new issues in their area. There's no additional repo or anything like that :)

thomasneirynck commented 5 years ago

This would be a prerequisite for enabling layer-fitting in the Kibana Maps-app. This is now only possible for document-layers backed by a geo_point field. cf. https://github.com/elastic/kibana/issues/33509

jpountz commented 5 years ago

I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:

talevy commented 5 years ago

thank you for taking a look, Adrien! I generally agree with all of your bullet points. responses inline...

I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:

  • EdgeTree only indexes the minY values, should it also index the maxY, minX and maxX values by alternatively splitting on each of these fields like we do for multi-points?

I will discuss this with Ignacio and Nick, It follows the existing edgetree implementation in Lucene, and it does not do this, most likely for more efficient point-in-polygon calculation using the infinite ray algorithm.

  • Serialization is a bit wasteful in terms of space: we don't need to serialize separately minY, maxY, y1 and y2: we could just recompute minY and maxY at deserialization time? Furthermore we could move from a fixed-length encoding to variable-length. My understanding is that we only leverage fixed length encoding in order to know how many bytes serializing the left child requires before actually serializing it. We could do it by serializing into a BytesStreamOutput first. It makes indexing a bit more memory-intensive, but this sounds worth the space savings to me.

I will explore this! I wanted to first get something to benchmark, and then we can iterate to see the tradeoffs in more quantitative terms. I do agree, though, it would be great to reduce the storage size.

  • GeometryTreeReader#intersects iterates over sub geometries linearly and returns true as soon as one matches. Should we build a tree there as well to avoid this linear scan?

I forgot to comment this as a TODO, but this is very much a TODO item to make this into a tree. it was left as a linear scan for now to focus on the sub-trees first.

  • This is the kind of functionality that it would be nice to have microbenchmarks for (the benchmarks sub folder in the root directory).

I wasn't too familiar with this module, I will add a TODO in the meta-issue to add micro-benchmarks here! thank you for the suggestion.

talevy commented 4 years ago

This issue has gone through a lot of evolution, so this comment is meant to give an update on where things landed.

The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:

This is achieved by implementing by migrating geo_shape mapper registration to a module (https://github.com/elastic/elasticsearch/pull/53562) and then defining the geo_shape with doc values in the spatial plugin (https://github.com/elastic/elasticsearch/pull/55037)

talevy commented 4 years ago

update from the last comment above:

geo_bounds, geo_centroid, geotile_grid, geohash_grid support for geo_shape is now merged and supported in 7.8 and master (8.0).

Closing this issue to reflect that. at the time of closing Documentation and the topic of geo_distance agg support are still TODOs. documentation is happening shortly, while the geo_distance agg support will require more discussion and future development