gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Add Uber h3 codes to Elastic index #450

Open timrobertson100 opened 3 years ago

timrobertson100 commented 3 years ago

The Uber H3 spatial grid system provides the ability to offer approximate-equal-area hexagons.

iDigBio have demonstrated this%20OR%20family:buthidae%20OR%20family:viperidae%22,%22_nw%22:%7B%22lng%22:-140.91743319091256,%22lat%22:80.77338017325232%7D,%22_se%22:%7B%22lng%22:140.91743319093894,%22lat%22:-80.77338017325457%7D%7D) which shows as:

image

We need to calculate and index In Elasticsearch the 16 H3 codes for the coordinate at all resolutions (2-15). This can be done using the java library simply by:

// for resolution 2..15 store the code
String hexAddr = h3.geoToH3Address(lat, lng, resolution);

The result should be something like this in ES, with the ability to aggregate by any level:

{
...
  uber_h3_res_2: ...,
  uber_h3_res_3: ...,
  ...
  uber_h3_res_15: ..., 
...
}

Edited above, thanks to @wilsotc for pointing out:

don't go below the H3 resolution of 2. Resolution 2 yields 5882 hexagons globally each of which has around 86.7 square kilometers. See here: https://h3geo.org/docs/core-library/restable

MattBlissett commented 3 years ago

Since it is not possible to tile the icosahedron with only hexagons, we chose to introduce twelve pentagons, one at each of the icosahedron vertices. These vertices were positioned using the spherical icosahedron orientation by R. Buckminster Fuller, which places all the vertices in the water. This helps avoid pentagons surfacing in our work.

I sure we'll have some occurrences on those pentagons, so that's something to check.

timrobertson100 commented 3 years ago

We surely will. I've watched the video and I understand those pentagons will exist in the grid mesh, will render as pentagons, but the distance calculations across those grid cells may be slightly off (not an issue for a density map). Something to check as you say...

wilsotc commented 3 years ago

I set the minimum H3 resolution to 2 in our H3 prototype but if there was a need to use lower resolutions, I think the odd shaped cells could be accounted for in the density calculation. As the 5882 hexagons of resolution 2 work well within the Elasticsearch default aggregation bucket limit of 2^16 and the non hexagonal cells issue I didn't use the lower resolutions.