NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
5 stars 0 forks source link

geo hashing (e.g. s2, h3, R-trees) #371

Open kyle-messier opened 1 month ago

kyle-messier commented 1 month ago

@mitchellmanware

We were discussing how prediction locations could be done in grids with no need to merge for a massive dataframe. We could build this grouping into a standardized spatial gridding system such as s2.

Prediction

We can easily use s2 to assign or create a prediction grid based on the establish s2 cells. Then groups of pred locations can easily and quickly be assessed via the s2 hashes.

Estimation

We could also consider using s2 cells to group any covariate calculation if we plan on using spatial grids as part of a cross pattern in targets.

mitchellmanware commented 1 month ago

@kyle-messier I'll have to do some more reading on s2, but as I understand it: once the final model is developed, the model is applied to each s2 prediction grid cell (ie. group of pre-defined prediction locations) individually. And using the s2 grid is a standard way to divide a large spatial region?

kyle-messier commented 1 month ago

@mitchellmanware Yes, it is a standard. Another option, and from the look of it, the R functions are a bit easier to use is the Uber H3 grids based on hexagon grids .

The general idea is that our prediction data is pre-tagged with an h3 [or s2] hash. When coordinates are requested to look up model results, the geo hash is looked up, then the geo hashes are used to retrieve. Basically it is all text matching as opposed to spatial operations, which are way faster.