Open jameshalgren opened 1 year ago
Tested this by using the old function and the new function to read a geopackage for the 03w VPU:
wget -P 03w -c https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_03W.gpkg
Then compare the outputs:
d = read_geopkg("03w/nextgen_03W.gpkg")
d2 = read_geopkg_sqlite3("03w/nextgen_03W.gpkg")
d.drop(["geometry"],axis=1) == d2.drop(["geom", "fid_x", "fid_y"],axis=1)
## Returns all True
The only difference is the geometry.
Useful comments on datacarpentry.org here.
ping @ZacharyWills
Your sql statement is a special case that has its own convenience function, pd.read_sql_table
.
pd.read_sql_table('flowpaths', con, index_col='id')
Your sql statement is a special case that has its own convenience function,
pd.read_sql_table
.pd.read_sql_table('flowpaths', con, index_col='id')
Python 3.10+
https://github.com/NOAA-OWP/t-route/blob/01c713acb02e03e1d4222b62c398c30981485030/src/troute-network/troute/HYFeaturesNetwork.py#L16-L20
In most cases (in every case?) for t-route, we are discarding the geometry data and only use the geo-enabled packages for convenience in reading the files where the network connectivity information is stored.
To avoid unnecessary (see note) dependencies, we could update this function to look something like the following:
Note: More accurately, "troublesome" dependencies. Gdal has been causing some trouble on the arm64 builds and if we can circumvent those issues, we gain (potentially) quite a bit of performance. Note2: The geopandas read is (I believe) lazy, so this direct sql read with the connection closing will carry a performance hit, potentially, especially for very large flowpath tables.