optimize rxclass code - Githubissues

"Resolves" #331

Explanation

Optimized current dag_tasks file by doing the following:

changed the rxclass_df.append() from appending a new dataframe with each append to collecting all the created rows in a list and then creating the final dataframe from that list
used the pandas deduplication method to remove duplicates that @jrlegrand mentioned in issue 331
tried speeding up api requests by making the requests asynchronous while keeping the rate to 20 calls per second

Rationale

@jrlegrand was able to fix the code so that it runs, but mentioned that the code ran slowly. I was able to confirm this and the adjustments make reduced the time to create sagerx_lake.rxclass by more than half

Screenshot 2024-11-24 at 8 39 46 PM

Tests

What testing did you do?

Did a quick QA by counting the number of rows in the table before and after the change and also counting the rows grouped by rela_source and verifying that the counts matched @jrlegrand 's counts

coderxio / sagerx

optimize rxclass code #333

Explanation

Rationale

Tests