broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

consider partitioning gdc_mirror into 2 tools: one to retrieve manifest, one to retrieve files #14

Open noblem opened 7 years ago

noblem commented 7 years ago

This is a useful by-product of talking with Chet about his effort to download files directly into FireCloud by feeding the FC task a manifest of files to retrieve from GDC.

Presently, gdc_mirror first scans the GDC to formulate an internal manifest of files to download. That can be generalized to either accept a manifest file at the CLI, or build one if none is given. Either way, the manifest is then handed off to the downloader to iterate upon, as is currently done.

This suggests that current guts of gdc_mirror can broken into 2 tools

gdc_get_manifest gdc_get_files

and gdc_mirror then rewritten as a short script which:

calls the manifest retriever if no manifest is provided calls the file retriever, using the (provided or fabricated) manifest

Each retriever would of course obey config file definitions (e.g. for where to output files, which programs/projects to use) etc.