Below I list possible ideas on how to improve user documentation. This is inspired by various sources, such as: RDF, meeting notes, old issues and (even older) issues references by them, git-annex branchable thread, and so on.
README TODOs:
[ ] elaborate on the three aspects (provision, compute, extraction/retrieval)?
[ ] describe use cases in which user may want to modify one of these aspects, but keep the remaining ones unchanged
[ ] elaborate on "immutable provenance record" vs "updatable compute instructions"
[ ] clarify that this needs to be done in the dataset in which the results of the (re)computation are collected
[ ] add info that one annex key can be associated with multiple compute instruction sets (multiple ways to obtain the same file content)
[ ] add info that compute instructions are versioned / get comitted
[ ] describe the structure of the datalad-remake URL
[ ] is it the same when used with lowercase or uppercase options?
[ ] possible ideas to improve examples?
[ ] explicitly mention examples directory in the main README
[ ] add example on how compute instructions can be updated (e.g. Michał's resampling.py)
[ ] update fMRIPrep examples, so that they match Felix's use of fMRIPrep?
[x] fix typo in git-annex name (no space)
Future README TODOs:
[ ] security & trust model / GPG
[ ] cost function (which compute instruction to run?)
[ ] CWL / Boutiques?
[ ] implement other use cases (distribits videos, remodnav paper)
Also, while working on the documentation I identified several questions / concerns.
Questions / concerns:
simplify the specification of the command? (currently strings need to be separated by ' ',)
compute instructions are versioned, but can compute instructions be updated without the need to create a new commit?
reproducibility check: how do the user knows that the computation was reproducible? (no change in the dataset?)
what happes, if the computation is not reproducible? (same instruction set, but different keys generated)
what happens when declared inputs overlap with declared outputs?
is "recursive" compute possible (i.e. compute of a file, which depends on a file not present, but possible to compute)? (see: https://pypi.org/project/datalad-getexec/)
datalad drop test.txt
datalad drop depends-on-test.txt
datalad get depends-on-test.txt
currently we make no use of datalad-container extension
Below I list possible ideas on how to improve user documentation. This is inspired by various sources, such as: RDF, meeting notes, old issues and (even older) issues references by them, git-annex branchable thread, and so on.
README
TODOs:datalad-remake
URLexamples
?examples
directory in the mainREADME
resampling.py
)Future
README
TODOs:Also, while working on the documentation I identified several questions / concerns.
Questions / concerns:
' ',
)datalad-container
extension