dhardestylewis / terrain_aggregator

Workflow to aggregate terrain imagery at scale to a single seamless image dataset
Other
10 stars 4 forks source link

NOAA semi-annual `terrain_aggregator` DB details #81

Open dhardestylewis opened 1 year ago

dhardestylewis commented 1 year ago

TNRIS high resolution terrain database details

terrain_aggregator provides a back-to-front approach to aggregating and serving source Lidar DEM tiles from a high-performance computing environment.

Context

Processing terrain data at scale requires relying on

DB preparation

tldr; 100% of TNRIS Lidar DEM tiles with GDAL-incompatible metadata were successfully "corrected" so these tiles could be included in scalable pre-processing with GDAL. With the corrections, 100% of the tiles successfully ran against common GDAL routines.

In order to prepare source terrain imagery tiles for use at scale, terrain_aggregator gathers all desired terrain tiles into a central PostgreSQL database and records basic but necessary metadata from each tile. Current TNRIS best practices require that DEM tile metadata is FGDC-compliant but does not require this metadata to be produced in way that supports essential DEM processing libraries such as GDAL. At least 10% of TNRIS's ~350,000 DEM tiles cannot by default be used with GDAL in particular, usually:

GDAL-incompatible # 1 usually occurs in newer TNRIS Lidar DEM tilesets, because more highly detailed projection information is provided, recording the provenance of the projection using a BOUNDCRS WKT2 key. Common GDAL operations do not yet support the BOUNDCRS WKT2 key, and so these tiles cannot be processed at scale using GDAL except by explicitly naming the correct projection code. terrain_aggregator stores the "corrected" projection code for these tiles as an attribute to these tiles in a PostgreSQL database to enable bulk processing to include these tiles.

A handful of tiles are impacted by GDAL-incompatible # 1 because no projection information has been included whatsoever. Currently, these tiles or tilesets containing these tiles require manual intervention in order to determine and assign the correct projection code.

GDAL-incompatible # 2 usually occurs for some older tiles and tilesets. In the vast majority of these cases, these tiles are labelled with an adjacent UTM zone to what they actually represent. Currently these tiles require manual intervention to correct their projections.

GDAL-incompatible # 3 refers to a few tiles with integer pixel data type and palette color interpretation. Some common GDAL routines will break if either the data type or the color interpretation is not consistent throughout. Reference to these tiles is maintained in the terrain_aggregator PostgreSQL DB, but these tiles are dropped from any further processing. Beyond the fact that these tiles having highly suspect elevation data, we can safely drop these tiles:

dhardestylewis commented 1 year ago
dhardestylewis commented 1 year ago

https://github.com/dhardestylewis/terrain_aggregator/issues/79#issue-1340924769

Here is how to do it: https://github.com/dhardestylewis/terrain_aggregator/issues/28#issuecomment-1221602757