Closed imotov closed 4 years ago
Pinging @elastic/es-analytics-geo
What is the status on this? I really need this for my maps. Anything I could help you out with? the @elastic/es-analytics-geo repo is not public, I guess.
No concrete status update right now. We're still discussing how to go about encoding shapes into doc values (there are several limitations/considerations that we need to work around). When we have some news moving forward -- a plan or something similar -- we'll make an update here :)
the elastic/es-analytics-geo repo is not public, I guess.
The @elastic/es-analytics-geo
team is just an alias that we use to ping internal email lists, so that different sub-teams can get notified about new issues in their area. There's no additional repo or anything like that :)
This would be a prerequisite for enabling layer-fitting in the Kibana Maps-app. This is now only possible for document-layers backed by a geo_point
field. cf. https://github.com/elastic/kibana/issues/33509
I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:
benchmarks
sub folder in the root directory).thank you for taking a look, Adrien! I generally agree with all of your bullet points. responses inline...
I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:
- EdgeTree only indexes the minY values, should it also index the maxY, minX and maxX values by alternatively splitting on each of these fields like we do for multi-points?
I will discuss this with Ignacio and Nick, It follows the existing edgetree implementation in Lucene, and it does not do this, most likely for more efficient point-in-polygon calculation using the infinite ray algorithm.
- Serialization is a bit wasteful in terms of space: we don't need to serialize separately minY, maxY, y1 and y2: we could just recompute minY and maxY at deserialization time? Furthermore we could move from a fixed-length encoding to variable-length. My understanding is that we only leverage fixed length encoding in order to know how many bytes serializing the left child requires before actually serializing it. We could do it by serializing into a BytesStreamOutput first. It makes indexing a bit more memory-intensive, but this sounds worth the space savings to me.
I will explore this! I wanted to first get something to benchmark, and then we can iterate to see the tradeoffs in more quantitative terms. I do agree, though, it would be great to reduce the storage size.
- GeometryTreeReader#intersects iterates over sub geometries linearly and returns true as soon as one matches. Should we build a tree there as well to avoid this linear scan?
I forgot to comment this as a TODO, but this is very much a TODO item to make this into a tree. it was left as a linear scan for now to focus on the sub-trees first.
- This is the kind of functionality that it would be nice to have microbenchmarks for (the
benchmarks
sub folder in the root directory).
I wasn't too familiar with this module, I will add a TODO in the meta-issue to add micro-benchmarks here! thank you for the suggestion.
This issue has gone through a lot of evolution, so this comment is meant to give an update on where things landed.
The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:
This is achieved by implementing by migrating geo_shape mapper registration to a module (https://github.com/elastic/elasticsearch/pull/53562) and then defining the geo_shape with doc values in the spatial plugin (https://github.com/elastic/elasticsearch/pull/55037)
update from the last comment above:
geo_bounds, geo_centroid, geotile_grid, geohash_grid support for geo_shape is now merged and supported in 7.8 and master (8.0).
Closing this issue to reflect that. at the time of closing Documentation and the topic of geo_distance agg support are still TODOs. documentation is happening shortly, while the geo_distance agg support will require more discussion and future development
There are several features that we would like to add that require fast retrieval of geo shapes and can greatly benefit from having docvalues support for geo shapes.
This issue has gone through a lot of evolution, so this description is edited to reflect the latest reality
The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:
This is achieved by implementing by migrating geo_shape mapper registration to a module (https://github.com/elastic/elasticsearch/pull/53562) and then defining the geo_shape with doc values in the spatial plugin (https://github.com/elastic/elasticsearch/pull/55037)