FDA / openfda

openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
572 stars 131 forks source link

Harmonization: SPL pipeline #203

Closed conor-mcmanamy closed 3 months ago

conor-mcmanamy commented 3 months ago

Hello,

I'm having some trouble running the SPL portion of the data workflows.

  1. Based on #72 , I changed the S3 bucket path to s3://download.open.fda.gov.
  2. I wasn't able to find any documentation on how the AWS profile should be configured to pull from the above bucket. @HansNelsen mentioned in 2016 here that there might be documentation updates forthcoming. Are the steps laid out in that (old) post still accurate?
  3. I notice that the code no longer uses DailyMed for SPL data. Why is it no longer used? Is it still a drop-in replacement for the S3-hosted data?

Have others had issues downloading this portion of data? What worked?

I'd appreciate all the input/advice you can offer. Thanks!

dkrylovsb commented 3 months ago

Hi @conor-mcmanamy at this point the SPL pipeline retrieves drug labels from an internal S3 bucket; the labels in download.open.fda.gov are no longer available. However, you can change the pipeline code to retrieve the labels from alternative locations such as DailyMed if you wish.

conor-mcmanamy commented 3 months ago

Thanks for the quick reply. Does the DailyMed data differ from what would be in the internal S3 bucket? I'd like the data to resemble that of the OpenFDA API as closely as possible

dkrylovsb commented 3 months ago

DailyMed data would ultimately be the same.

conor-mcmanamy commented 3 months ago

Sounds good, thanks for the confirmation and the help @dkrylovsb