Closed aaraney closed 2 months ago
Ive answered the above questions in-line below:
- Do GeoPackage’s support relationships? It appears several fields in the Hydrofabric could benefit from classic foreign key like constraints.
Sort of, but only fully supported through a related-tables GeoPackage extension. In short this extension establishes several new tables and a convention for creating a "mapping table" which maps a primary key entry from a table T to some other primary key in table U. It is worth mentioning that all primary keys in a geopackage respected table are type INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
(typically this is the "fid" field). This implies that the relationship between "fid"'s is meaningful although generally these fields, in our context, are just the order in which rows were generated and have no guarantees version to version or domain to domain. Moreover, the related-tables extension does not use a foreign key constraint to accomplish this mapping. This makes sense to support many-to-many relationships so, it appears that key constraints would have to be defined and enforced carefully using triggers (i.e. ON DELETE CASCADE
).
Notably, GDAL 3.6 added a GDALRelationship
class for capturing these relationship and in the upcoming 3.7 release creating, updating, and deleting these relationships will be added.
However, it does not seem that anything is stopping you from adding foreign key constraints to a feature table's schema, but sqlite3
also does not check foreign key constraints by default. So, it would be up to the tooling being used to turn on this pragma by default. As I cover later, geopackage does require issuing a foreign_key_check
, but this only checks if constraints are still valid. More on this later.
Assuming the library is compiled with foreign key constraints enabled, it must still be enabled by the application at runtime, using the PRAGMA foreign_keys command. For example:
sqlite> PRAGMA foreign_keys = ON;
foreign_keys = ON;
seems to be the case for QGIS>=3.20
. Original QGIS PR is here. However, I can't find the same behavior in the GDAL codebase (hopefully im missing something).
- Can you specify foreign key constraints when creating a GeoPackage using standard tooling like GDAL and QGIS?
Deducing from the above information and from reading the 1.3.1 standard, it doesnt seem like anything is stopping you. Additionally, per Requirement 7
of the standard:
Requirement 7 The SQLite PRAGMA foreign_key_check SQL with no parameter value SHALL return an empty result set indicating no invalid foreign key values for a GeoPackage file.
See PRAGMA foreign_key_check
. However, this does not enforce that implementations of geopackage enable PRAGMA foreign_keys = ON;
. Im taking this to mean that additional foreign key constraints, like ON UPDATE CASCADE
would not complete and I would assume the database would return something if issued a foreign_key_check
. Meaning, the existence of the foreign key constraint would appear to leave the db in a "corrupted" state. However, ive not checked this.
- Can you specify that a column has a
UNIQUE NOT NULL
constrain when creating a GeoPackage using standard tooling like GDAL and QGIS?
Using QGIS 3.18, I was not given the option to specify that a field was UNIQUE NOT NULL
.
However, looking through the GDAL C++ bindings, it looks like you can accomplish this using the OGRFieldDefn
class. Specifically, the SetNullable
and SetUnique
methods. However, ive not verified that the GPKG
driver respects these methods.
- Can you constrain that GeoPackage table schema’s are
STRICT
? GeoPackage’s aresqlite3
databases after all and without something like aSTRICT
table constraintsqlite3
entries are duck typed.
It seems there is a technical limitation to doing this. Since GeoPackage defines it own field types (simple feature types). See https://github.com/opengeospatial/geopackage/issues/644. It does seem like it is legal to put a check(typeof(<field> = "T"))
however ive not read up on the sqlite3
docs regarding check
nor typeof
.
- Can GDAL query tables in a GeoPackage that do not contain a
geom
field and are not in thegpkg_contents
table?
Yes, as long as the tables are correctly registered as non spatial tables:
The core GeoPackage specification of GeoPackage 1.0 and 1.1 did not support non-spatial tables. This was added in GeoPackage 1.2 as the “attributes” data type.
The driver allows creating and reading non-spatial tables with the GeoPackage aspatial extension.
Starting with GDAL 2.2, the driver will also, by default, list non spatial tables that are not registered through the gdal_aspatial extension, and support the GeoPackage 1.2 “attributes” data type as well. Starting with GDAL 2.2, non spatial tables are by default created following the GeoPackage 1.2 “attributes” data type (can be controlled with the ASPATIAL_VARIANT layer creation option).
This is slightly an aside, but below is an example of opening a geopackage using GDAL's python swig bindings. The swig bindings are generated from (not exactly, but close enough) GDAL's cpp API and thus mirror it pretty closely.
from osgeo import ogr, gdal
def main():
hydrofabric_path = "/vsicurl/https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_09.gpkg"
driver: Optional[ogr.Driver] = ogr.GetDriverByName("GPKG")
data_source: Optional[ogr.DataSource] = driver.Open(
hydrofabric_path, gdal.GA_ReadOnly
)
nexus: ogr.Layer = data_source.GetLayerByName("nexus")
# OGRFieldType: GetType()
# NOTE: this will not include the geometry column
field_names = [(f.GetName(), f.GetTypeName()) for f in nexus.schema]
n_features = nexus.GetFeatureCount()
features = []
for _ in range(nexus.GetFeatureCount()):
feature: ogr.Feature = nexus.GetNextFeature()
# feature.items()
# dictionary of {col: value}; only in python
geometry: ogr.Geometry = feature.GetGeometryRef()
feature_fields = [feature.GetField(idx) for idx in range(len(field_names))]
features.append(feature_fields)
# divides: Optional[ogr.Layer] = data_source.GetLayerByName("divides")
# per the docs, this is how we close a dataset using the python bindings
# https://gdal.org/api/python_gotchas.html#saving-and-closing-datasets-datasources
nexus = None
return field_names, features
if __name__ == "__main__":
field_names, features = main()
While investigating the hydrofabric’s schema to inform work related to model specific BMI configuration files, a few cross-cutting questions bubbled up that will likely impact NGEN development. The purpose of this issue is to document these questions, discover other questions, and hopefully find resolutions to these questions that assist in adding GeoPackage support to NGEN (related to #295).
UNIQUE NOT NULL
constrain when creating a GeoPackage using standard tooling like GDAL and QGIS?STRICT
? GeoPackage’s aresqlite3
databases after all and without something like aSTRICT
table constraintsqlite3
entries are duck typed.geom
field and are not in thegpkg_contents
table?In the expandable section below, ive included the hydrofabric geopackage schema dumped from the v1.2 HUC 01 NGEN hydrofabric.
Hydrofabric 1.2 GeoPackage SQL schema
Note, ive only included the tables relevant to the hydrofabric. Also, this schema deviates from the schema documented on the `hydrofabric`’s [GitHub.io](http://github.io/) page([source](https://github.com/NOAA-OWP/hydrofabric/blob/dfe0b30e1097a4b532a75285d4413743dc2e8d32/docs/schema.Rmd), [live](https://noaa-owp.github.io/hydrofabric/schema.html)). ```SQL CREATE TABLE IF NOT EXISTS "flowpaths" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" LINESTRING, "id" TEXT, "lengthkm" REAL, "main_id" MEDIUMINT, "member_comid" TEXT, "tot_drainage_areasqkm" REAL, "order" REAL, "realized_catchment" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "divides" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" MULTIPOLYGON, "id" TEXT, "areasqkm" REAL, "type" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "flowpath_attributes" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "rl_gages" TEXT, "rl_NHDWaterbodyComID" TEXT, "Qi" REAL, "MusK" REAL, "MusX" REAL, "n" REAL, "So" REAL, "ChSlp" REAL, "BtmWdth" REAL, "time" REAL, "Kchan" REAL, "nCC" REAL, "TopWdthCC" REAL, "TopWdth" REAL, "length_m" REAL ); CREATE TABLE IF NOT EXISTS "nexus" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "geom" POINT, "id" TEXT, "type" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "flowpath_edge_list" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "toid" TEXT ); CREATE TABLE IF NOT EXISTS "crosswalk" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "toid" TEXT, "NHDPlusV2_COMID" REAL, "NHDPlusV2_COMID_part" MEDIUMINT, "reconciled_ID" REAL, "mainstem" MEDIUMINT, "POI_ID" TEXT, "POI_TYPE" TEXT, "POI_VALUE" TEXT ); CREATE TABLE IF NOT EXISTS "cfe_noahowp_attributes" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "gw_Coeff" REAL, "gw_Zmax" REAL, "gw_Expon" REAL, "ISLTYP" MEDIUMINT, "IVGTYP" MEDIUMINT, "bexp_soil_layers_stag=1" REAL, "bexp_soil_layers_stag=2" REAL, "bexp_soil_layers_stag=3" REAL, "bexp_soil_layers_stag=4" REAL, "dksat_soil_layers_stag=1" REAL, "dksat_soil_layers_stag=2" REAL, "dksat_soil_layers_stag=3" REAL, "dksat_soil_layers_stag=4" REAL, "psisat_soil_layers_stag=1" REAL, "psisat_soil_layers_stag=2" REAL, "psisat_soil_layers_stag=3" REAL, "psisat_soil_layers_stag=4" REAL, "cwpvt" REAL, "mfsno" REAL, "mp" REAL, "refkdt" REAL, "slope" REAL, "smcmax_soil_layers_stag=1" REAL, "smcmax_soil_layers_stag=2" REAL, "smcmax_soil_layers_stag=3" REAL, "smcmax_soil_layers_stag=4" REAL, "smcwlt_soil_layers_stag=1" REAL, "smcwlt_soil_layers_stag=2" REAL, "smcwlt_soil_layers_stag=3" REAL, "smcwlt_soil_layers_stag=4" REAL, "vcmx25" REAL ); CREATE TABLE IF NOT EXISTS "forcing_metadata" ( "fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "id" TEXT, "areasqkm" REAL, "cetroid_lon" REAL, "centroid_lat" REAL, "elevation" REAL, "slope_m_km" REAL, "aspect" REAL ); ```