FDA / openfda

openFDA is a research project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
567 stars 131 forks source link

Download and Regulary Update All Device Data #176

Closed funkrusher closed 2 years ago

funkrusher commented 2 years ago

I'm not sure if i'm at the right place here with my question...

The OpenFDA Webpage writes:

To keep your downloaded data up to date, you need to re-download the data every time it is updated. 

The data i need to download is about 10 gigabytes. So if i want to keep my data up-to-date i would need to download 10 gigabytes every day.

hopefully i'm somehow still at the right place here for my questions :)

dkrylovsb commented 2 years ago

There is definitely no need to download 10GB worth of data daily, especially considering the fact MAUDE updates only weekly. You can hit https://api.fda.gov/download.json and then look at the export_date field to determine whether or not the downloadable files have changed since the last time you pulled them down.

Device Events uses mdr_report_key as the "primary key" for the dataset (see here). And based on this description, a combination of medical specialty, product code and regulation number could be used as a key for Device Classification.

funkrusher commented 2 years ago

thank you very much.

Thank your for providing me the primary key of the deviceEvents and a possible one of deviceClassification.

I hope it's not too much to ask, but I'm a beginner and it would be nice, if you could also help me to find the primary keys for the rest of the device datasets. This would be:

device.enforcement
device.event [OK]
device.classification [OK]
device.510k
device.pma
device.recall
device.registrationlisting
device.udi
device.covid19serology

Im not a native speaker, so its a bit hard for me, to find the primary-keys of the datasets in the documentation-text. I know it should be provided in the "Searchable Fields" Documentation of your website. For example: https://open.fda.gov/apis/device/event/searchable-fields/

funkrusher commented 2 years ago

my most important requirement would be that the mdr_report_key will always identify the same device-event record in subsequent runs and after the download-files have been updated with new files by openfda.

That way i can use it as primary-key in my local database to identify the given record and make an SQL-UPDATE into my local database if i have already read an device-event record with mdr_report_key=123 in a previous run, or an SQL-INSERT if i have never read the mdr_report_key=123 in my database. I hope it can work this way.

dkrylovsb commented 2 years ago

Sure thing:

device.510k: k_number
device.pma: pma_number, supplement_number
device.recall: product_res_number
device.registrationlisting: registration_number
device.udi: look within the identifiers array for the identifier of type "Primary" 
device.covid19serology: evaluation_id, date_performed, sample_no

And yes, mdr_report_key works exactly as you described above.

funkrusher commented 2 years ago

awesome, thx

grimuz commented 2 years ago

writing with my second account here... one final missing :) but i think i found it. for device.enforcements i think it could be the "recallNumber"

Mariano215 commented 2 years ago

What matches the "events" to the "510k", "pma", or "registration" device?

On Tue, Sep 28, 2021 at 4:11 AM bhuber @.***> wrote:

one final missing :) but i think i found it. for device.enforcements i think it could be the "recallNumber"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FDA/openfda/issues/176#issuecomment-928959000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJHB2ZEWNKA2HNU5QF37S43UEF2DLANCNFSM5EYWMUEA .

--

Mariano 267-746-0682 iPhone/txt http://fourolivesproductions.com http://onenightfeature.com http://www.marianomattei.com

grimuz commented 2 years ago

also one additional question.

device.event.mdr_text is sub-array within device.event. It contains the texts and provides a field mdr_text_key.

I wanted to use this field as unique-key, but it seems there are duplicates for this field.

For example:

This query shows that the mdr_text_key "16363682" exists multiple times for the same device-event (mdr_report_key).

It seems to me that it should be unique, is it not?

dkrylovsb commented 2 years ago

Yes, that field should be unique, but apparently there are duplicates within the source data files. For example, they key you referenced above is duplicated in foitext2003.txt and foitext2004.txt:

foitext2003.txt:503218|16363682|D|1||DURING A BILATERAL HERNIA PROCEDURE, THE ANCHOR DID NOT FIX NORMALLY AND THE MESH LOOSENED AFTER FIXATION. FURTHER, THE ANCHOR CAUSED A BIGGER BLEEDING WHICH WAS CONTROLLED BY AN RF DEVICE. A BIGGER TROCAR WAS TAKEN AND THE PROCEDURE WAS FINISHED WITH AN "EMS". NO CONSEQUENCES FOR THE PT.
foitext2004.txt:503218|16363682|D|1||DURING A BILATERAL HERNIA PROCEDURE, THE ANCHOR DID NOT FIX NORMALLY AND THE MESH LOOSENED AFTER FIXATION. FURTHER, THE ANCHOR CAUSED A BIGGER BLEEDING WHICH WAS CONTROLLED BY AN RF DEVICE. A BIGGER TROCAR WAS TAKEN AND THE PROCEDURE WAS FINISHED WITH AN "EMS". NO CONSEQUENCES FOR THE PT.

We will work on enhancing the pipeline to catch and remove duplicates and report back once done. Thank you for bringing this to our attention.

grimuz commented 2 years ago

@dkrylovsb

ok thanks, thats good to know.

i have also recognized Duplicates for the Field "device.enforcements" --> "recallNumber". Is it ok that i use this field as primary-key of the "device.enforcements" ?

sorry for asking so many questions, but i guess that would be my final one (for now) :D

dkrylovsb commented 2 years ago

device.event.mdr_text duplicates have been removed.

Yes, recall_number should be used as the primary key for the Device Enforcements datasets. There is a small number of duplicate records indeed -- a total of 6 -- which we are also going to look at and fix shortly. Thank you for bringing this to our attention.

dkrylovsb commented 2 years ago

The duplicates in the Device Recall Enforcement dataset have been removed as well.