Open jrlegrand opened 1 week ago
@jrlegrand it looks like the reason why the code is failing is because it does not handle cases where the API response lacks the 'rxclassDrugInfoList' key, (meaning there is no class data associated with the concept) leading to a KeyError. The problem is happening in the process_concept
function. We just need to figure out how to handle this situation more elegantly.
Potential solutions off the top of my head:
Let me know what you think would be the best solution moving forward and I'll see how I can fix the code so that it works
I pushed up some code to the branch. It works - see my most recent commit message. It runs in 2.5 hours which could be optimized I'm sure. I noticed when the key doesn't exist, it returns an empty object {}. Also I spot checked against RxClass for may_treat "Multiple Myeloma" and SageRx had 3 fewer IN drugs than the RxClass UI online. These ones were missing in SageRx. IN 3639 doxorubicin IN 612937 interferon alfa-n3 IN 72257 interferon beta-1b https://mor.nlm.nih.gov/RxClass/search?query=Multiple%20Myeloma%7CDISEASE&searchBy=class&sourceIds=&drugSources=atc1-4%7Catcprod%2Cepc%7Cdailymed%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Ctherap%7Csnomedct%2Cschedule%7Crxnorm
Hmm... I don't see an IN listed for the may_treat Multiple Myeloma relationship in the API (I'm only seeing the PIN) so maybe it's not an issue with our code. Maybe it's some weird thing with RxClass UI?
API https://rxnav.nlm.nih.gov/REST/rxclass/class/byRxcui.json?rxcui=612937
NOTE: the only may-treat relation is a PIN with RXCUI 72258.
I suspect what the RxClass UI is doing is mapping PIN to IN if an IN doesn't already exist in the list. In other words, I see a lot of PINs that kind of have "sister" INs... except for these 3. They only show up as PINs. But you can map from PIN to IN to get the IN if that's preferred.
RxNav https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=72258
Number of rows by rela_source
https://github.com/coderxio/sagerx/pull/333 - potential optimization
Problem Statement
See related branch jrlegrand/rxclass-rework.
RxClass API has a rate limit of 20 calls / second.
There's about 123,246 API calls.
[2024-11-22, 01:14:28 CST] {logging_mixin.py:137} INFO - URL List created of length: 123246
I'm no mathematician, but 20 calls / second x 60 seconds / minute = 1200 calls / minute. 123,246 / 1200 calls / minute = 103 minutes or exactly 1 hour and 43 minutes.
When I run my branch locally, it runs for 1 hour and 43 minutes and errors out with the error below.
As I'm writing this, I think the issue is more about what happens after the API calls have completed - seeing as the time it ran is appropriate based on my #math above and the error seems to be about a KeyError.
Either way, this is not working - maybe the problem isn't with my rate limiting code, but either way it would be great to have other eyes on this.
Criteria for Success
RxClass DAG runs in about 1 hour 45 minutes and does not error out.
Additional Information
https://lhncbc.nlm.nih.gov/RxNav/TermsofService.html