OpenDRR / opendrr-api

REST API for OpenDRR data / API REST pour les données OpenDRR
MIT License
4 stars 7 forks source link

Move database and schema creation from postgis container to add_data.sh? #128

Open anthonyfok opened 3 years ago

anthonyfok commented 3 years ago

Background: Earlier this week (Mon 2021-07-26), Will (@wkhchow), using Docker Desktop on Windows 10, ran into a somewhat obscure issue where the database core schemas were missing when the previous run was OK:

python-opendrr_1  | 2021-07-26T22:17:11.320636256Z psql:Create_table_canada_site_exposure_ste.sql:90: ERROR:  schema "exposure" does not exist
...
python-opendrr_1  | 2021-07-26T22:18:25.234458138Z psql:Create_all_tables_update.sql:109: ERROR:  schema "vs30" does not exist
...

python-opendrr_1  | 2021-07-26T23:12:14.312723750Z psycopg2.errors.InvalidSchemaName: schema "dsra" does not exist
python-opendrr_1  | 2021-07-26T23:12:14.318808185Z sqlalchemy.exc.ProgrammingError: (psycopg2.errors.InvalidSchemaName) schema "dsra" does not exist

Will suspected that the following code from postgis/create_db.sh (called from postgis/Dockerfile) was not run:

echo "Creating schema..."
psql -d "${DB_NAME}" -a -U"${POSTGRES_USER}" -f /schema.sql

Fortunately, a fresh docker compose run proceeded correctly with all the core schemas intact. What happened in the previous failed schema-missing run is still a mystery.

That made me wonder: should move the database-creation and schema-creation commands from postgis/create_db.sh (Docker image creation) into add_data.sh instead?

And, from the copyright name "MartinHeinz" in that file, probably Joost started with ideas from e.g. https://martinheinz.dev/blog/3 . And from commit logs: AWS Lambda service... and probably before he started creating the "python" container? The more recent postgis/Dockerfile.aws and postgis/extensions.sql added an extra step to enable PostGIS extensions:

At this point: I know too little about it. "If it ain't broke, don't fix it." Better leave it the way it is. Especially when there are many other more urgent issues to fix.

However, the fact that the "missing core schema" incident happened, even only once thus far, seems to point to a potential issue. A race condition, perhaps?

jvanulde commented 3 years ago

Isn't the schema already created in the dump file?

wkhchow commented 3 years ago

the opendrr-boundaries.sql just restores the associated geopackages from our boundaries repo into the boundaries schema given the existing schema already exists in the opendrr_db and postgis extension already enabled. We could create a db dump that restores all the schema and just get rid of schema.sql as an alternative.

anthonyfok commented 3 years ago

Isn't the schema already created in the dump file?

If you mean opendrr-boundaries.sql (and my copycat opendrr-boundaries.dump), there is only

CREATE SCHEMA IF NOT EXISTS boundaries;

whereas OpenDRR/opendrr-api postgis/schema.sql contains:

-- create initial source data schemas
CREATE SCHEMA IF NOT EXISTS boundaries;
CREATE SCHEMA IF NOT EXISTS census;
CREATE SCHEMA IF NOT EXISTS dsra;
CREATE SCHEMA IF NOT EXISTS exposure;
CREATE SCHEMA IF NOT EXISTS ghsl;
CREATE SCHEMA IF NOT EXISTS lut;
CREATE SCHEMA IF NOT EXISTS mh;
CREATE SCHEMA IF NOT EXISTS psra;
CREATE SCHEMA IF NOT EXISTS ruptures;
CREATE SCHEMA IF NOT EXISTS sitemesh;
CREATE SCHEMA IF NOT EXISTS sovi;
CREATE SCHEMA IF NOT EXISTS vs30;
CREATE SCHEMA IF NOT EXISTS gmf;

I would suggest that we keep the status quo, as in we create only the boundaries SCHEMA in opendrr-boundaries.sql, but leave the other SCHEMAs (census, dsra, exposure, ghsl, etc.) out of it, especially how it is presented as just boundaries data, and not as "opendrr-data-part-1".