Closed danlooo closed 1 year ago
The current hierarchical index and U3 encode the child index as a 3bit number. This makes 8 possible values. However, there are only 7 child hexagons and only 4 child diamonds. If we use the integer representation of the bits directly as numeric indices in YAXArrays, we'd let every 8th cell empty (there is no 8th children). This is unused space in the chunks scattered evenly acrosss the datacube. We can also Just sort the cells by this binary index and then just use the row number as the id. But then we'd lose the direct encoding of parents, children / neighbors...
Adding a meta data table with bits for each resolution and row rumber as the new id would be nice. A complete DGGS can have > 50*10^9 cells (i.e. ISEA4H). We need 12 bytes to encode the hierarchical index. This would be 600 GB just for the index. But can be massively reduced with run length encoding and lazy cartesian products, because the data is sorted.
TLDR; Cache geo lon/lat -> cell id in a ball tree, no meta data table, and write a closed formula to go from int64 running number -> res0 digit. res1 digit . ... Need to handle pentagons as well...
There is a closed formula to get hierarchical cell index tuple from the linear cell id in memory space:
# get hierarchical index: int to tuple at resolutions
# n is the row number of the sorted table of cartesean product of cells
function hindex(vectors, n)
vectors_lengths = length.(vectors)
n_sets = length(vectors)
res = map(1:4) do set
if set == n_sets
n_reps = 1
else
n_reps = *(vectors_lengths[(set+1):n_sets]...)
end
pos = n |> n -> n - 1 |> n -> (n / n_reps) % vectors_lengths[set] |> floor |> x -> x + 1 |> Int
vectors[set][pos]
end
res
end
vectors = [
[1, 2, 3, 4], # resolution 3
[1, 2, 3], # resolution 2
[1, 2], # resolution 1
[1] # resolution 0 (base cells)
]
hindex(vectors, 5)
hindex.(Ref(vectors), 1:24)
Indeed, (1,3,1,1) is the 5th tuple. Storing the binarized tuple as int instead would result in 1/8 of cells being empty (3 bits vs 7 children)
Definition:
An index is a bijective function $f: R^2 \rightarrow R, f(lat,lon)=cellID$ mapping locations x = (lon, lat) into a single dimensional memory space.
Requirements for the index:
Options to encode a hierarchical spatial cell index:
Start again using lookup tables
This PR addresses issue #36, #28, #25, #13. This package aims to provide a data structure to allow convolutional neural networks over equal area cells.
The closed existing solution seems to be Uber H3. However, U3 uses a gnomonic projection which comes with high area distortions. On the other side, DGGRID grids solve this issue by using a Snyder Equal Area projection.
This PR aims to add an Uber H3 like index to DGGRID grids combining the advantages of both packages. Cell center points were stored in a ball tree at all resolutions. Parent cells were calculated based on nearest neighbor search using geodesical Vincenty's distance. Cell indicies were retrieved by combining the cell index of the parent with the bearing of the child center to its parent. This is the hierarchical index will be used as a primary key for the cell ids in storing and parent/neighbor retrieval.
We can not use the original Uber H3 index, because the polyhedron orientation to the earth is different in DGGRID. Furthermore, cells in DGGRID are centered arround polyhedron verticies, whereas cells in Uber H3 are centered across icosahedron face centers.
Cells being a child of either one hexagon or pentagon cell at DGGRID resolution 3. Child cell index parts in bold are calculated based on the child cells ordered by azimuth from the child cell to its parent. Indicies were composed by concatenating the index of each parent cell at each coarser resolution (Here: DGGRID resolutions 0 bis 3).