ctti-clinicaltrials / aact

Improving Public Access to Aggregate Content of ClinicalTrials.gov
http://aact.ctti-clinicaltrials.org
MIT License
123 stars 33 forks source link

Recurrent mass ID change #1149

Open j6e opened 4 months ago

j6e commented 4 months ago

Hi,

I am working with this DB for a while and to my surprise the IDs of some elements (like facilities or interventions) keep changing every once in a while, but there are none (or little) changes in any of the non-ID fields. I find quite annoying that the same element keeps changing IDs, it's really difficult to follow up the changes that way.

Thanks,

micronix commented 4 months ago

Hello @j6e that is correct, the original developers structured it that way, we have a ticket in our backlog to revisit this and figure out a way so that they do not change as often. The reason it happens is because when updating the data for a study, it is a lot easier and faster to remove everything and then reinsert the updated values even if they did not change. Every day we update the oldest 80k studies, meaning the studies that haven't been updated recently. In a period of about a week the entire database is updated and the ids will change.

What kind of query are you writing where this is an issue?

j6e commented 4 months ago

Thanks for the response. I'm parsing some information from the AACT DB and combining it with some external sources, for example the locations. And I'm doing this process continuously and have to take into account updated information of the studies. I can still identify every location with a nct_id, but if some field changes (say location name from Investigation site 1 to New Amsterdam Hospital) I need the ID in order to know what registry to update.

Every day we update the oldest 80k studies, meaning the studies that haven't been updated recently

This means if a study has been updated last day it's related IDs should not change next day? This is important to me so I can guarantee that I can make the field update via ID.