NOAA-OWP / ngen

Next Generation Water Modeling Engine and Framework Prototype
Other
84 stars 62 forks source link

Hydrofabric GeoPackage support via GDAL #496

Closed aaraney closed 2 months ago

aaraney commented 1 year ago

While investigating the hydrofabric’s schema to inform work related to model specific BMI configuration files, a few cross-cutting questions bubbled up that will likely impact NGEN development. The purpose of this issue is to document these questions, discover other questions, and hopefully find resolutions to these questions that assist in adding GeoPackage support to NGEN (related to #295).

In the expandable section below, ive included the hydrofabric geopackage schema dumped from the v1.2 HUC 01 NGEN hydrofabric.

Hydrofabric 1.2 GeoPackage SQL schema Note, ive only included the tables relevant to the hydrofabric. Also, this schema deviates from the schema documented on the `hydrofabric`’s [GitHub.io](http://github.io/) page([source](https://github.com/NOAA-OWP/hydrofabric/blob/dfe0b30e1097a4b532a75285d4413743dc2e8d32/docs/schema.Rmd), [live](https://noaa-owp.github.io/hydrofabric/schema.html)). ```SQL CREATE TABLE IF NOT EXISTS "flowpaths" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" LINESTRING, "id" TEXT, "lengthkm" REAL, "main_id" MEDIUMINT, "member_comid" TEXT, "tot_drainage_areasqkm" REAL, "order" REAL, "realized_catchment" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "divides" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" MULTIPOLYGON, "id" TEXT, "areasqkm" REAL, "type" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "flowpath_attributes" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "rl_gages" TEXT, "rl_NHDWaterbodyComID" TEXT, "Qi" REAL, "MusK" REAL, "MusX" REAL, "n" REAL, "So" REAL, "ChSlp" REAL, "BtmWdth" REAL, "time" REAL, "Kchan" REAL, "nCC" REAL, "TopWdthCC" REAL, "TopWdth" REAL, "length_m" REAL ); CREATE TABLE IF NOT EXISTS "nexus" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" POINT, "id" TEXT, "type" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "flowpath_edge_list" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "crosswalk" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "toid" TEXT, "NHDPlusV2_COMID" REAL, "NHDPlusV2_COMID_part" MEDIUMINT, "reconciled_ID" REAL, "mainstem" MEDIUMINT, "POI_ID" TEXT, "POI_TYPE" TEXT, "POI_VALUE" TEXT ); CREATE TABLE IF NOT EXISTS "cfe_noahowp_attributes" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "gw_Coeff" REAL, "gw_Zmax" REAL, "gw_Expon" REAL, "ISLTYP" MEDIUMINT, "IVGTYP" MEDIUMINT, "bexp_soil_layers_stag=1" REAL, "bexp_soil_layers_stag=2" REAL, "bexp_soil_layers_stag=3" REAL, "bexp_soil_layers_stag=4" REAL, "dksat_soil_layers_stag=1" REAL, "dksat_soil_layers_stag=2" REAL, "dksat_soil_layers_stag=3" REAL, "dksat_soil_layers_stag=4" REAL, "psisat_soil_layers_stag=1" REAL, "psisat_soil_layers_stag=2" REAL, "psisat_soil_layers_stag=3" REAL, "psisat_soil_layers_stag=4" REAL, "cwpvt" REAL, "mfsno" REAL, "mp" REAL, "refkdt" REAL, "slope" REAL, "smcmax_soil_layers_stag=1" REAL, "smcmax_soil_layers_stag=2" REAL, "smcmax_soil_layers_stag=3" REAL, "smcmax_soil_layers_stag=4" REAL, "smcwlt_soil_layers_stag=1" REAL, "smcwlt_soil_layers_stag=2" REAL, "smcwlt_soil_layers_stag=3" REAL, "smcwlt_soil_layers_stag=4" REAL, "vcmx25" REAL ); CREATE TABLE IF NOT EXISTS "forcing_metadata" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "areasqkm" REAL, "cetroid_lon" REAL, "centroid_lat" REAL, "elevation" REAL, "slope_m_km" REAL, "aspect" REAL ); ```
aaraney commented 1 year ago

Ive answered the above questions in-line below:

  • Do GeoPackage’s support relationships? It appears several fields in the Hydrofabric could benefit from classic foreign key like constraints.

Sort of, but only fully supported through a related-tables GeoPackage extension. In short this extension establishes several new tables and a convention for creating a "mapping table" which maps a primary key entry from a table T to some other primary key in table U. It is worth mentioning that all primary keys in a geopackage respected table are type INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL (typically this is the "fid" field). This implies that the relationship between "fid"'s is meaningful although generally these fields, in our context, are just the order in which rows were generated and have no guarantees version to version or domain to domain. Moreover, the related-tables extension does not use a foreign key constraint to accomplish this mapping. This makes sense to support many-to-many relationships so, it appears that key constraints would have to be defined and enforced carefully using triggers (i.e. ON DELETE CASCADE).

Notably, GDAL 3.6 added a GDALRelationship class for capturing these relationship and in the upcoming 3.7 release creating, updating, and deleting these relationships will be added.

However, it does not seem that anything is stopping you from adding foreign key constraints to a feature table's schema, but sqlite3 also does not check foreign key constraints by default. So, it would be up to the tooling being used to turn on this pragma by default. As I cover later, geopackage does require issuing a foreign_key_check, but this only checks if constraints are still valid. More on this later.

Assuming the library is compiled with foreign key constraints enabled, it must still be enabled by the application at runtime, using the PRAGMA foreign_keys command. For example:

sqlite> PRAGMA foreign_keys = ON;

foreign_keys = ON; seems to be the case for QGIS>=3.20. Original QGIS PR is here. However, I can't find the same behavior in the GDAL codebase (hopefully im missing something).

  • Can you specify foreign key constraints when creating a GeoPackage using standard tooling like GDAL and QGIS?

Deducing from the above information and from reading the 1.3.1 standard, it doesnt seem like anything is stopping you. Additionally, per Requirement 7 of the standard:

Requirement 7 The SQLite PRAGMA foreign_key_check SQL with no parameter value SHALL return an empty result set indicating no invalid foreign key values for a GeoPackage file.

See PRAGMA foreign_key_check. However, this does not enforce that implementations of geopackage enable PRAGMA foreign_keys = ON;. Im taking this to mean that additional foreign key constraints, like ON UPDATE CASCADE would not complete and I would assume the database would return something if issued a foreign_key_check. Meaning, the existence of the foreign key constraint would appear to leave the db in a "corrupted" state. However, ive not checked this.

  • Can you specify that a column has a UNIQUE NOT NULL constrain when creating a GeoPackage using standard tooling like GDAL and QGIS?

Using QGIS 3.18, I was not given the option to specify that a field was UNIQUE NOT NULL.

However, looking through the GDAL C++ bindings, it looks like you can accomplish this using the OGRFieldDefn class. Specifically, the SetNullable and SetUnique methods. However, ive not verified that the GPKG driver respects these methods.

  • Can you constrain that GeoPackage table schema’s are STRICT? GeoPackage’s are sqlite3 databases after all and without something like a STRICT table constraint sqlite3 entries are duck typed.

It seems there is a technical limitation to doing this. Since GeoPackage defines it own field types (simple feature types). See https://github.com/opengeospatial/geopackage/issues/644. It does seem like it is legal to put a check(typeof(<field> = "T")) however ive not read up on the sqlite3 docs regarding check nor typeof.

  • Can GDAL query tables in a GeoPackage that do not contain a geom field and are not in the gpkg_contents table?

Yes, as long as the tables are correctly registered as non spatial tables:

The core GeoPackage specification of GeoPackage 1.0 and 1.1 did not support non-spatial tables. This was added in GeoPackage 1.2 as the “attributes” data type.

The driver allows creating and reading non-spatial tables with the GeoPackage aspatial extension.

Starting with GDAL 2.2, the driver will also, by default, list non spatial tables that are not registered through the gdal_aspatial extension, and support the GeoPackage 1.2 “attributes” data type as well. Starting with GDAL 2.2, non spatial tables are by default created following the GeoPackage 1.2 “attributes” data type (can be controlled with the ASPATIAL_VARIANT layer creation option).

aaraney commented 1 year ago

This is slightly an aside, but below is an example of opening a geopackage using GDAL's python swig bindings. The swig bindings are generated from (not exactly, but close enough) GDAL's cpp API and thus mirror it pretty closely.

from osgeo import ogr, gdal

def main():
    hydrofabric_path = "/vsicurl/https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_09.gpkg"

    driver: Optional[ogr.Driver] = ogr.GetDriverByName("GPKG")
    data_source: Optional[ogr.DataSource] = driver.Open(
        hydrofabric_path, gdal.GA_ReadOnly
    )

    nexus: ogr.Layer = data_source.GetLayerByName("nexus")
    # OGRFieldType: GetType()
    # NOTE: this will not include the geometry column
    field_names = [(f.GetName(), f.GetTypeName()) for f in nexus.schema]
    n_features = nexus.GetFeatureCount()

    features = []
    for _ in range(nexus.GetFeatureCount()):
        feature: ogr.Feature = nexus.GetNextFeature()
        # feature.items()
        # dictionary of {col: value}; only in python
        geometry: ogr.Geometry = feature.GetGeometryRef()

        feature_fields = [feature.GetField(idx) for idx in range(len(field_names))]
        features.append(feature_fields)

    # divides: Optional[ogr.Layer] = data_source.GetLayerByName("divides")

    # per the docs, this is how we close a dataset using the python bindings
    # https://gdal.org/api/python_gotchas.html#saving-and-closing-datasets-datasources
    nexus = None

    return field_names, features

if __name__ == "__main__":
    field_names, features = main()