cancerDHC / ccdh-terminology-service

CCDH Terminology and Mapping Service
3 stars 4 forks source link

Importer: Automate GDC data dictionary download while running importer #133

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Description

From the data/data_dictionary/gdc/README.md:

GDC Data Dictionary in JSON

The json files are downloaded from the backend of GDC data dictionary viewer. The files are timestamped by the date that it was downloaded.

The URL for the file is https://api.gdc.cancer.gov/v0/submission/_dictionary/_all

The current.json file is a symlink to the most current version.

The command to download a current version and update the symlinked current.json is:

# run it in the project root path
python -m ccdh.importers.gdc

We might as well run this while doing normal importation. If there's any issue related to time taken to download or API frequency constraints, we can program that in and refer to a local cache if we need to. Can also add a try/except for good measure.