The remake effort aims to serve a few general use cases, and also to yield tools that can serve all of them with maximum alignment with other existing solutions or development. These general use cases are:
provenance capture of programmatic dataset modifications (i.e., the domain of datalad run)
re-execution of provenance records, for the purpose of
verifying reproducibility (i.e., datalad rerun)
re-applying computational steps on different data (i.e., datalad rerun --onto)
output extraction after execution of (parametric) compute instructions (i.e., "compute for get" special remote)
depositing compute instructions for "prospective outputs" (never computed/recorded)
A list of more concrete use cases will help to inform both design and presentation (documentation, paper) of the implementation. Here is a (growing) collection for consideration as documentation example, or use case featured prominently in the paper:
fmriprep: compute large outputs, hash them, an rely on them being bit-identical reproducible to avoid storing them
provide data in alternative (file) formats (store CSV, provide XLSX on-demand)
render partial data for specific purposes (produce video clips from source video via a cutlist)
apply all edits to a RAW photo to render a JPEG on demand
The
remake
effort aims to serve a few general use cases, and also to yield tools that can serve all of them with maximum alignment with other existing solutions or development. These general use cases are:datalad run
)datalad rerun
)datalad rerun --onto
)A list of more concrete use cases will help to inform both design and presentation (documentation, paper) of the implementation. Here is a (growing) collection for consideration as documentation example, or use case featured prominently in the paper: