datalad / datalad-remake

Other
0 stars 0 forks source link

Design `datalad remake-capture` or `remake-sink` #13

Open mih opened 4 months ago

mih commented 4 months ago

This is about the second half of https://github.com/datalad/datalad-remake/issues/10 -- a datalad-based data sink or data capture helper. See https://github.com/datalad/datalad-remake/issues/12 for the other half.

Purpose

Accept data (files) from some source/location, and inject them into a datalad dataset (as a new commit, to some branch, under some file names(s)), and optionally push the dataset modification to a remote or service that accepts a (serialized) dataset (update).

Target use cases

Provenance capture

It would be useful to be able to ingest provenance information on the dataset modifications

API

remake-capture is likely using remake-provision whenever it is not operating on an already existing local repository so (1) and (2) would need to be aligned between the commands.

(4) is included to make remake-provision and remake-capture be the only two datalad "nodes" to make arbitrary workflow system datalad compatible. Of course (4) could also be a dedicated execution of datalad push -- different trade-off -- subject to further discussion.