LSSTDESC / rail_base

Base classes for RAIL
MIT License
0 stars 1 forks source link

Design Qs for making get_data (data curl script) customizable #22

Open OliviaLynn opened 1 year ago

OliviaLynn commented 1 year ago

The cli script get_data was added in PR #20.

It currently gets one file from NERSC, but we would like to be able to (1) get more files and (2) allow the user to specify a subset of files

Design questions:

  1. Will we be grabbing data from places other than NERSC (and if the same file is available 2 place, do we allow the user to specify which to target)
  2. Will we allow users to specify subsets of data on a file-by-file basis, or will we group them into logical subsets on their own (ie, "download all the files needed to run \<something>")
  3. Will we run through available data with a y/n prompt (using the prompting feature in click, we could walk them through each available file for convenience), or will we ask the user to type/copy-paste each file name in
aimalz commented 1 year ago
  1. I think the tests and demos will not be getting data from other places.
  2. From the user perspective, I'd ideally want files to be downloaded at the beginning of a pipeline (in a script or notebook), like up at the imports, and to only download what that script/notebook needs (preferably skipping any files I already have).
  3. In the use case of running it at the beginning of a pipeline notebook/script, the user won't know which files they need, so I think pipeline scripts/notebooks we provide will need to include a list of the necessary file names to grab in the call to get_data, no?