WikiWatershed / model-my-watershed

The web application front end for Model My Watershed.
https://modelmywatershed.org
Apache License 2.0
57 stars 31 forks source link

AWS 2-5: Ingest Global Basins #3647

Closed rajadain closed 10 hours ago

rajadain commented 2 months ago

Ingest the TDX Hydro Global Basins dataset into Postgres. This should match the additional fields added to the streams in #3646.

The source data will be made available by @ptomasula at s3://data.mmw.azavea.com/. Once it is:

rajadain commented 4 days ago

Should use the following script to ingest them:

ogr2ogr -progress -if "Parquet" \
          -f "PostgreSQL" PG:"host=localhost user=mmw dbname=mmw password=XXX" \
          -nln tdxbasins \
          -lco GEOMETRY_NAME=geom \
          -lco FID=LINKNO \
          -append -skipfailures \
          TDX_streamreach_basins_mnsi_\*.parquet

Once imported, we should create this index to help with delineation:

CREATE INDEX idx_tdxbasins_root_discover_finish
ON tdxbasins (root_id, discover_time, finish_time);
rajadain commented 4 days ago

Once that is in place, we can use the following query to delineate a watershed, given a basin clicked by the user:

WITH target AS (SELECT *
FROM tdxbasins
WHERE linkno = ?)

SELECT ST_Union(geom)
FROM tdxbasins
WHERE root_id = (SELECT root_id FROM target)
  AND discover_time <= (SELECT discover_time FROM target)
  AND finish_time >= (SELECT finish_time FROM target);
rajadain commented 2 days ago

Data has been ingested on Staging successfully. Now exporting for reuse using:

env PGPASSWORD=xxx pg_dump --clean --if-exists --no-owner --dbname=modelmywatershed --username=modelmywatershed --host=database.service.mmw.internal --table=tdxbasins --compress=9 --verbose > tdxbasins.sql.gz
rajadain commented 10 hours ago

Data exported to s3://data.mmw.azavea.com/tdxbasins.sql.gz