NOAA-OWP / ngen

Next Generation Water Modeling Engine and Framework Prototype
Other
83 stars 62 forks source link

GeoPkg Input Support #295

Closed hellkite500 closed 1 year ago

hellkite500 commented 2 years ago

Enhance hydrofabric reading by supporting geopkg inputs directly.

Current behavior

NGen only supports a single input format for hydrofabric, namely geojson.

Expected behavior

As the size of the hydrofabric increases, it will be worth supporting other formats that are less disk and memory intensive. Geopkg may be a good format to support, as this is generated natively by the hydrofabric generation in the first place.

Notes

GeoPkg works with SQLite, so this brings, minimally, a SQLite dependency. There are a few possible avenues to explore.

mattw-nws commented 2 years ago

Interesting. libgpkg has last commit of 5 years ago, which might be important (or might not). I like the idea of SpatiaLite very much... they are MPL (took a while to find it) and not using git, it appears... so bringing in that dep could be a little more difficult.

program-- commented 1 year ago

An initial (non-production) reference implementation for reading GeoPackages via SQLite can be found here: https://github.com/program--/gpkg. It currently supports read-only access to SQLite and GeoPackage databases, so it is not as extensive as libgpkg or SpatiaLite, but something like it should work well enough for our usecase. As expected, it only requires SQLite as a new dependency.

Since ngen::geojson already has Feature classes, we can conform the reference implementation to output ngen::geojson::FeatureCollection objects, and similar. This should prevent any initial refactoring or modifications to the code outside of reading the catchment/nexus files, i.e. only here should need modification once read support is implemented (as far as I'm aware): https://github.com/NOAA-OWP/ngen/blob/7551590a415b89026559c1c570d4154e4746161b/src/NGen.cpp#L262-L266

Based on the reference implementation, I think an initial ngen implementation will look like:

// sqlite.hpp

class SQLiteRowIterator {
    bool done();   // whether the iterator is finished
    SQLiteRowIterator next();   // go to next row
    SQLiteRowIterator reset();  // start from beginning of query
    T get<T>(int); // get column value by index
    T get<T>(string); // get column value by name
};

class SQLiteDatabase {
    // Core query function
    SQLiteRowIterator query(string);

    // Core parametrized query function
    SQLiteRowIterator query(string, ...);
}

The row iterator could be made nicer by implementing "true" iterator access, which also would let us use range-based for loops to iterate...

// gpkg.hpp

// either inherit or use a smart pointer to
// a SQLiteDatabase -- either works. The
// reference impl uses a smart pointer, but inheritance might be better
class GeoPackage : SQLiteDatabase {
    vector<string> layers(); // names of layers available
    FeatureCollection features(string); // features from a layer
}

Additionally, we could extend the above for Hydrofabric data, specifically, to support operations for accessing data directly, for example:

class Hydrofabric : GeoPackage {
    vector<HY_Catchment> catchments();
    vector<HY_HydroLocation> nexi();
    // ...
}

Or something similar, but that starts to build scope creep.

Any suggestions, concerns, or something I missed? @hellkite500 @mattw-nws