coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
45 stars 12 forks source link

Concurrency and DAG cleanup #257

Closed lprzychodzien closed 6 months ago

lprzychodzien commented 7 months ago

Resolves #256, #250, #218, #265

Explanation

Cleaned up DAG based on new airflow standard. Changed RXCUIs to look at ingredient level reducing the number of API calls from 70k to 18k. Added concurrency logic to run multiple API calls at once. Fix table column naming

Reduces runtime from 5 hours to 6 minutes for RxClass

image

Reduces runtime from 5 hours to 20 min for RxNorm Historical

Tests

  1. What testing did you do?
  2. Ran DAG to completion. Checks PG database for correct count and got 24227 rows: SELECT COUNT(*) FROM datasource.rxclass_atc_to_product
testing logs ``` ```