PSU-CSAR / django-snodas

Project to store SNODAS daily rasters and serve as tiles, with some analytical capabilities.
1 stars 0 forks source link

snodas - enhancement spec #6

Open jdduh opened 1 year ago

jdduh commented 1 year ago

NWCC would like to summarize the SNODAS SWE data in the different elevation zones in the AOI. The statistics to be calculated for these zones and the complete AOIs are: mean/max/min and quartiles (1st, 2nd, and 3rd) for the SWE pixels within each zone/AOI.

The intervals of the elevation zones are pre-determined: 0-1000 ft, 1000-2000 ft, 2000-3000 ft, ... 13000-14000 ft, and 14000 ft and above. Most, if not all, AOIs have only a subset of these zones.

Given that the SNDAS SWE data's format is fairly static, I'm thinking of having a different database solution to solve this problem (and some other problems). We could save the value of each daily SWE pixel as a record in the database and link that record to its associated stationTriplet and elevation zone. We can bypass any "spatial analysis" and just perform database queries to get the statistics.

One advantage of this approach is that the SWE pixels are pre-assigned to the AOI/Zone, we could apply more appropriate rules to determine which SWE pixel is in which AOI/Zone (i.e., areal weighted if we also keep track of the area proportions of the pixel in each AOI/zone).

jdduh commented 1 year ago

SNODAS has a spatial resolution of 923 x 923 meters. The area covered by the AOIs is around (less then) 2.5M square KM. One additional consideration is that SWE pixels and stationTriplet have a many-to-many relationship.

jkeifer commented 1 year ago

This is an interesting idea (and in the past I've thought about trying to solve more generally with a sql interface for raster data; it seems like it should be possible with a postgres extension, but alas I don't have time for such a project).

We could easily build a set of quadkey hashes to represent the cell coordinates covered by a given AOI in fewer rows, and allow selection of all covered cells by a join on those quadkey coordinates. As long as we don't have to do any geometry operations at query time this should be relatively performant. I've done something like this before and it is pretty straightforward, though we'll have to modify the web mercator tiling functions for the SNODAS grid geotransform (or find some such equivalent to translate between lat/long and pixel coordinates).

The bigger concern I have is the size of the database. Even if all the data we need to store in the database from the SNODAS rasters are the cell rows, I believe the overall database size will increase significantly. We'll probably need to be careful to ensure the tables are fully normalized to reduce the cell row tuple sizes, and partitioning may become necessary to help with non-indexed queries.

But one big problem I see is that we are currently storing the rasters in the database, and we probably can't store them and cell rows in the same database without making it too large for the currently available storage on the server. I don't think we need to store the rasters in the database, except that is currently how we are generating tiles for the web map. We could store the rasters on disk as COGs and use on-demanding tiling from the COG sources to get away from having to store the rasters in the database, but that's more work to consider.