deepayan / nhanes-postgres

A docker container for NHANES data inside a PostgreSQL DBMS
0 stars 0 forks source link

another metadata table #8

Open rgentlem opened 1 month ago

rgentlem commented 1 month ago

I think that we probably need to have some metadata in the container about the bioc stuff https://ccb-hms.github.io/phonto/vignettes/nhanes-local.html EPICONDUCTOR_CONTAINER_VERSION (e.g., v0.12.0) EPICONDUCTOR_COLLECTION_DATE (e.g., 2023-11-21) EPICONDUCTOR_DB_DRIVER (e.g., FreeTDS on Linux) EPICONDUCTOR_DB_SERVER (e.g., localhost) EPICONDUCTOR_DB_PORT (e.g., 1433)

at least the first two should be in the DB, not sure what else - but it won't really work to define them as what ever you want at the command line - somehow the DB itself needs to know its version and collection date, otherwise bad things could happen

deepayan commented 1 month ago

It probably makes most sense to record the collection date whenever updating the snapshot in https://github.com/deepayan/nhanes-snapshot.

The container version is tied to docker releases, so we can keep that in a file that can be manually updated for every release.

These two we can definitely add to the metadata.

We can drop DB_DRIVER for the postgres version (it was only needed for MS SQL Server via odbc).

DB_SERVER and DB_PORT are related to accessing the database, but not intrinsic to the database itself (they can be changed by the user). So these are probably OK to just define as environment variables inside the docker. Anyone using the database from outside will need to define them when connecting from R.

rgentlem commented 1 month ago

ok, just to be clear, my use case is running nhanes from a local machine and accessing the DB in some fashion (it could be running on some local host, it could be running in Docker) but this is for when the user is not running R inside the same container as the DB and so there is no access to anything like an environment variable - we still will want to get the version info for the data