databrickslabs / mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
https://databrickslabs.github.io/mosaic/
Other
271 stars 66 forks source link

Requesting support for Mercator tile index #219

Open abhilshitsoni-tomtom opened 2 years ago

abhilshitsoni-tomtom commented 2 years ago

Is your feature request related to a problem? Please describe. A lot of times spatial vector data needs to be converted to rasters for processing. Rasters are usually square or rectangular in shape. In such cases it is usually desired to fetch data using a tiling scheme that is rectangular or square like mercator and then convert each tile into georeferenced rasters for further processing. Since current H3 index is hexagonal, it becomes difficult to create a raster out of every hexagonal tile. One has to maintain the transformation information separately by padding hexagons to make it square before converting them to rasters.

Describe the solution you'd like Provide support to store data using mercator tiling scheme. Libraries like Mapbox Mercantile https://github.com/mapbox/mercantile have very nice implementation for the same.

Describe alternatives you've considered Using H3 index, pad them to make a rectangular envelope maintaining the information of the coordinates of the corners derived after padding, which is very tedious to achieve.

Additional context

Contributing to Mosaic Guidelines for contributing to Mosaic can be found in CONTRIBUTING.md.

edurdevic commented 2 years ago

Yes, we are certainly going to look into raster and more grid types in the next few months. Rectangular grid systems can also have better tessellation performance since you can recursively split the original geometry over multiple resolutions. If you had to choose between S2, Geohash and XYZ, which one would you pick?

abhilshitsoni-tomtom commented 2 years ago

Thank you for affirming that. I would definitely pick XYZ. The reason is that apart from the usecase mentioned in my original question, a lot of our existing data that we want to process is present into partitions created using this tiling scheme. Even a lot of satellite or aerial imagery vendors provide us data tiled using this tiling scheme. When we want to process such datasets we want to retain the reference to original tiles for geo-registration purposes.

edurdevic commented 1 year ago

That makes sense. We will prioritise XYZ over the other two then.

sebdiem commented 1 year ago

Very interested by geohash on my side. And thanks again for the awesome work!

PadenZach commented 1 year ago

Out of curiosity, how complex would adding something like geohash for this be? Mosaic seems to have many useful features, but is essentially unusable for us since we have several algorithms that rely on geohash (and specifically, the properties of them that H3 doesn't satisfy).

edurdevic commented 1 year ago

@PadenZach we are currently working on implementing a generic rectangular grid system that would enable using any CRS system for an arbitrary map extent. This is quite useful for local national projections, but can also be used on plain lon-lat coordinates (like geohash does).

The custom grid system works by specifying a set of parameters (CUSTOM(minX,maxX,minY,maxY,splits,rootCellSizeX,rootCellSizeY) ) that define

Given those parameters (eg. CUSTOM(-180,180,-180,180,2,30,30)) you can run any operation that Mosaic supports (polyfill, point-to-index, k-ring, tessellation, etc.).

We are still working on finding a great bin-packing for the cell IDs, but this is the concept we have in mind. It would be similar to geohash, except that the index would be a long type (it makes the joins much faster) and the x-y split would be symmetrical (geohash uses 4 on X and 8 on Y).

PadenZach commented 1 year ago

Thanks for the details Erni! That sounds like a great feature. Unfortunately, I am still worried however that this may not be sufficient for us.

For context, we use geohashes in several internal processes that make it quite hard for us to replace them with other indexing systems. All of our internal processing is done in databricks, however, several downstream systems after batch processing interact with the produced data via geohashes. So, if we were to use this custom grid solution we'd still need to be able to convert the custom mosaic index to a geohash in one-to-one fashion.

Given many popular geohash libraries already use either long/int or bit representations internally, if the indexing function itself would be equal to the geohash algorithm once given certain parameters, than this should be an additional function we could create. However; I'm not sure what the implementation of the long column this indexing algorithm would use and if the longs produced by it would be equivlanet.

edurdevic commented 1 year ago

We are evaluating the option of making it compatible with geohash. We will have an update on this in the next few weeks.

milos-colic commented 1 year ago

@PadenZach We will be adding geohash in the coming 0.4.x releases. I will link this issue to the PR once it is opened. Should be not too far into the future.