coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
49 stars 13 forks source link

optimize rxclass code #333

Open saywurdson opened 4 days ago

saywurdson commented 4 days ago

"Resolves" #331

Explanation

Optimized current dag_tasks file by doing the following:

  1. changed the rxclass_df.append() from appending a new dataframe with each append to collecting all the created rows in a list and then creating the final dataframe from that list
  2. used the pandas deduplication method to remove duplicates that @jrlegrand mentioned in issue 331
  3. tried speeding up api requests by making the requests asynchronous while keeping the rate to 20 calls per second

Rationale

@jrlegrand was able to fix the code so that it runs, but mentioned that the code ran slowly. I was able to confirm this and the adjustments make reduced the time to create sagerx_lake.rxclass by more than half

Screenshot 2024-11-24 at 8 39 46 PM

Tests

  1. What testing did you do?