SageRx is a medication ontology and medication-related data aggregator created from many different public sources of data.
Including data from DailyMed, FDA, RxNorm, Orange Book, and more!
SageRx uses Airflow to schedule jobs to extract, load, and transform (using dbt) open drug data.
Data ends up in a PostgreSQL database and can be queried using pgAdmin (included with SageRx) or via any SQL editor of your choice.
We will be moving documentation over to Github, but additional documentation exists on the SageRx Website.
Subscribe to our newsletter to keep ontop of updates.
We would love to see you contribute to SageRx. Join our Slack channel to get involved.
Style Guide: How we think about the structure and naming conventions of SageRx.
.env
file at the root of the repo..env
file.
AIRFLOW_UID=<uid>
- UID can be found by running id -u
on linux systems, typically the first user on the system is 1000
or 1001
.
wsl
and then within WSL 2, you can enter id -u
to see your UID.UMLS_API=<umls_api_key>
- if you want to use RxNorm, you need an API key from UMLS.docker-compose up airflow-init
.docker-compose up
.NOTE: if you have an M1 Mac
export DOCKER_DEFAULT_PLATFORM=linux/amd64
, and re-build your imagesNOTE 2: if you're running WSL1/2 you may need to use
docker compose
rather thandocker-compose
per this
localhost:8001
or 0.0.0.0:8001
airflow
/ airflow
localhost:8002
or 0.0.0.0:8002
sagerx
/ sagerx
On docker-compose up
a dbt container will be created to be used for cli commands. To enter commands run docker exec -it dbt /bin/bash
. This will place you into a bash session in the dbt container. Then you can run dbt commands as you normally would.
To serve dbt documentation locally, enter the commands in the dbt container dbt docs generate
then dbt docs serve --port 8081
. They should generate on http://localhost:8081
The export_marts DAG is implemented to allow users to push .csv versions of the marts-layer tables to an AWS S3 bucket of their choosing. The DAG is currently configured to export 2 tables from the sagerx_dev schema: all_ndc_descriptions and atc_codes_to_rxnorm_products. Future iterations may allow for more schemas/tables as demand dictates. If a user wishes to get .csv copies of those tables pushed to an AWS S3 bucket, they will need to add an additional 3 variables to their .env file (continuing from the Installation instructions):
The access and secret-access keys can be found in 2 ways:
Currently we are utilizing 2 GCP products: Google Cloud Storage (GCS) and BigQuery (BQ).
The current workflow has all of the dbt tables are created locally with only the final products being pushed to GCP. This reduces computational expenses especially as we test out new data sources and need to run dbt more frequently.
To accomplish this yourself, you need to follow these steps:
If you get issues on folder permissions:
sudo chmod -R 777 postgres,data,extracts,logs,plugins
If you get trouble from the postgres container with errors such as password authentication failed for user "airflow"
or role "airflow" does not exist
, these are all from the same issue that postgres is not setting itself up correctly. This is because of a file permission issue solved by running chmod +x ./postgres/0_pg_stat_statement.sh
. You might need to remove any existing database configuration with rm -rf airflow/data
and docker-compose down --volumes
.