ebmdatalab / clinicaltrials-act-tracker

https://fdaaa.trialstracker.net/
MIT License
16 stars 2 forks source link

Bug in interpreting new Results Submitted logic #219

Open NickCEBM opened 4 years ago

NickCEBM commented 4 years ago

Each day we scrape ClinicalTrials.gov after our data download to check on the current reporting status of trials with pending results (e.g. check for cancellation status) from the ClinicalTrials.gov Results Submitted tab.

During the 31 Jan 2020 update, I noticed in my normal review that the trial counts I had weren't matching up with how they should. Upon some investigation I found four trials: NCT01234532, NCT01644656, NCT01942486 and NCT03135535 that are now unreported on staging (but were previously counted as reported).

Investigating further, it appears that ClinicalTrials.gov added dates to the "Results Submitted" record for these trials. All four trials were either at one point, or are currently cancelled. The catch is, the submission, cancellation, and resubmission all happened on the same day as seen here:

image

I had never seen this before so either a) ClinicalTrials.gov had a bug or b) the system previously couldn't handle/display a record correctly when all of these events happened in the same day, and they have now corrected it. I can confirm the actual data changed in the XML record as well. See below for trial NCT01234532.

January 30th data: image

January 31st data:

image

Currently all four trials are shown as "Overdue Cancelled" as of 31 Jan 2020 and while one of them is that status, I think it's actually broken along with all the others with these additons as it screwed with our logic to display trial status.

The current reporting status of the 4 trials should be: NCT01234532 - Reported Late NCT01644656 - Reported Late NCT01942486 - Reported Late NCT03135535 - Reported Late

It seems clear they have been broken by the logic of having the original submission, cancellation, and re-submission on the same day.

I believe the logic for this is in our code is here: https://github.com/ebmdatalab/clinicaltrials-act-tracker/blob/df56fbd88f978721f240f80288cd64596379c63b/clinicaltrials/frontend/management/commands/process_data.py

And based on the following, I believe you dealt with nonsense like this (but not exactly like this) while setting it up:

image

So, unfortunately the "Assumes you can't submit twice on the same day" logic doesn't hold anymore so we need to rethink this logic and how it translates to trial status on the website.

sebbacon commented 4 years ago

Can I get an urgency rating on this bug please? OMG fix it now?

NickCEBM commented 4 years ago

I'm feeling pretty iffy about sending an update live with wrong trial data on it, especially with some news pieces on our paper due to come out over the next few days.

If the paper hadn't just come out, I'd say this was about a 5 in urgency, but with the likely attention over the next week or so, it's probably a 7 or 8 out of 10.

I'm having a look now to see if I can fix the issue myself. I'll give myself a bit more time on it before I throw my hands up and we try and put our heads together on it.

NickCEBM commented 4 years ago

Ok I think I've reached the limits of my ability in trying to figure out exactly how to fix this.

Conceptually, I think I can see what's happened. Somehow the fact that submitted_date is the same as submitted has violated some logic somewhere that is causing the website to interpret the trials as still cancelled even though the cancellation may have actually been rectified and even have the round returned in QC. On line 69 of the process_data.py file you have the comment # Assumes you can't submit twice on same day that makes this clear.

I just can't put my finger on exactly what is breaking because of that. I understand that the TrialQA object is that spreadsheet that holds all the QA data we scrape and we're setting row values for cancellation data based on looping over the rows in the table and then the cancellations within a row. I just can't find where/why it all falls apart if there are two submissions on the same day.

NickCEBM commented 4 years ago

On checking whether the fixed up code worked today, those four trials referenced above are all still showing as "overdue-cancelled" but they should be "Reported (late)".

NickCEBM commented 4 years ago

So everything works great now in terms of things having the right labels, but just noting a non-emergency leftover bug in which these trials that were fixed are lingering in the API as having changed status (even days later).

ex: image

No need for time spent on this now.

NickCEBM commented 4 years ago

A note on this, the API call for "Trials that are no longer due" is getting a little crowded as we see more trials that had the sub/cancel/resub on the same day. There are currently 8 trials lingering there due to having this same-day cancellation pattern.

NickCEBM commented 4 years ago

Also, I'm not sure if this currently breaks anything but lol

image