clr / riak_geo

Geospatial indexing built on the Riak Distributed Database
0 stars 2 forks source link

Determine the index data structure to be stored in Riak Object #4

Open nathanaschbacher opened 11 years ago

nathanaschbacher commented 11 years ago

The indices will be initially partitioned and distributed throughout the cluster based on their respective HEALPix pixel ID's. However that only gets us all the entries in the index at the location and resolution of the pixel. Essentially a first-order analysis.

A second-order analysis will need to be performed on the entries actually stored in the index to satisfy both range and nearest neighbor queries.

A naive approach would be to store the entries with their position as a list and then iterate through them checking against a reference point and maintaining a resulting list of matches.

That will become extremely inefficient with larger index sizes. This could be improved upon through either creating a tree structure that pre-orders the indexed positions to make filtering faster, or by partitioning the list and iterating over sections of it in parallel recursively, or both.

Another challenge will be how to implement the index such that entries can be updated and/or deleted.

dankerrigan commented 11 years ago

First thought: Leaving distributed aside, R-Trees are probably the most common index type for geospatial data. Quad-trees are an option if we're just dealing with handling points.

Links from HipChat: http://informix-spatial-technology.blogspot.com/2012/01/comparison-on-b-tree-r-tree-quad-tree.html http://arxiv.org/pdf/1201.1784v1.pdf

Second thought: If we decide to index at different resolutions of grids (wrt Healpix), we can either 1. Store separate indexes for each resolution, 2. Store resolutions in a tree structure so that lower resolution nodes link to higher resolution nodes (maybe store metadata such as count of leaf nodes)