Closed sshugsc closed 9 months ago
@sshugsc , please post on the ticket the new log you're getting and add a link in the description. Also, check in the log if everything looks ok. I'll duplicate here a question I posted on KBDEV-1158 so we don't loose track: is it loading all the trials, only the newer ones, or the user can choose at run time?
~log file: clinicaltrialsgov_test1.logs.txt loaded data could be checked in db test_shirley1 It only loads the newer ones for now.~
There is ~500k CT at clinicaltrials.gov, and ~70k in GKB prod, so we should be loading much more than 1000, unless we are only uploading the last 2 weeks, but then, why 1000 records? @sshugsc, can you please find how many records we should usually get?
Most of the trials are not cancer-relevant, we only load cancer relevant trials. Additionally we only load interventional trials (something where they try to change a condition), a lot of trials are observational only which isn't really useful to us
In the log file, we can see {"error":0,"success":1000} and "ClinicalTrial":{"created":213}, with 622 warn messages; are we sure these 1000 records are actually as many succes? What is happening to the ClinicalTrial record whan a Disease term is not found?
again.... been a while but afaik the clinical trial record will be created irrespective of the disease. However if the disease term or drug term is not found then it isn't linked to it in the DB. There's a couple reasons for not just creating everything we see as a durg/disease term but the biggest one is that the naming is super inconsistent in clinicaltrails.gov. I remember we have numbers for this in the PORI paper
Dosumentation about these fields is needed. @sshugsc , maybe by looking on the website, by asking via email, or by pigning @creisle ?
Ty @creisle for the link and explanations; I knew we were filtering the records but it's nice to have more context. The http request has a "count=10000" but something is limiting the results to 1000. @sshugsc , maybe there is pagination now?
0 files ±0 0 suites ±0 0s :stopwatch: ±0s 0 tests ±0 0 :heavy_check_mark: ±0 0 :zzz: ±0 0 :x: ±0
Results for commit f1a9c26d. ± Comparison against base commit b68e6a05.
Ty @creisle for the link and explanations; I knew we were filtering the records but it's nice to have more context. The http request has a "count=10000" but something is limiting the results to 1000. @sshugsc , maybe there is pagination now?
Thank you @creisle for the explanations! @mathieulemieux yes, there is a pageSize limit to 1000 mentioned on their web https://clinicaltrials.gov/data-api/api .
@sshugsc , please post on the ticket the new log you're getting and add a link in the description. Also, check in the log if everything looks ok. I'll duplicate here a question I posted on KBDEV-1158 so we don't loose track: is it loading all the trials, only the newer ones, or the user can choose at run time?
Discussed with @elewis2 , planned to keep this PR stick to load all the trials (after filters) with clinicaltrial.gov api. @mathieulemieux New commit is pushed, ready for review.
log file: clinicaltrialsgov.logs.txt loaded data could be checked in db test_shirley5
Next steps:
0 files ±0 0 suites ±0 0s :stopwatch: ±0s 0 tests ±0 0 :heavy_check_mark: ±0 0 :zzz: ±0 0 :x: ±0
Results for commit 351c45b1. ± Comparison against base commit b68e6a05.
0 files ±0 0 suites ±0 0s :stopwatch: ±0s 0 tests ±0 0 :heavy_check_mark: ±0 0 :zzz: ±0 0 :x: ±0
Results for commit 351c45b1. ± Comparison against base commit b68e6a05.
update clinical trial loader to use the new clinicaltrial.org api