datalad / datalad-deprecated

DataLad extension for functionality that has been phased out of the core package
Other
0 stars 3 forks source link

RFC: annex-metadata-outputs procedure wishlist #77

Open yarikoptic opened 4 years ago

yarikoptic commented 4 years ago

I am thinking about creating a procedure such as annex-metadata-outputs which should take as its parameters options for git annex metadata -s call(s) which would be ran on the (annexed) files in the last commit.

The idea came from the fact that unlikely we would extend datalad save with some options to specify git annex metadata to be set for the saved files. So may be then I could use --proc-post to do that. Such procedure should get a list of modified files in the last commit, and run git annex metadata -s on them. Then we should be able to use it with a regular datalad save or datalad run (may be! since it might happen that run doesn't save any new results.. not yet sure what to do about that). It is also partially due to the inability to specify those via .gitattributes: https://git-annex.branchable.com/git-annex-metadata/#comment-fde59930f108af0fff842f5e25351e93

Sample use cases

$ datalad --proc-post annex-metadata-outputs distribution-restrictions=sensitive save -m "Added various license files" licenses/*
$ datalad --proc-post annex-metadata-outputs distribution-restrictions=sensitive containers-run -m "Running subject X" --output logs/* -n containers/repronim-ptb-3 scripts/myexperiment.m

I still feel that simply specifying in .gitattributes some action to do on the matching files would be the most consistent and reliable way. That is why I am still wondering if such a procedure worth pursuing. May be in the scope of metalad it needs to generalize even further (attach not only git annex metadata) anyways.

So -- the issue is open for discussion

mih commented 4 years ago

Q to me is whether git-annex metadata is the right receptacle for this information, but that would depend on the desired use-cases. In metalad there is already a custom extractor that has the ability to pull metadata for individual files from a configurable location. This approach could be generalized.

However, if the desire is to make annex aware of metadata, e.g. to be able to use it in wanted expressions that wouldn't help much.

yarikoptic commented 4 years ago

yes, annex wanted has been my primary usecase for this field for awhile now. Quite a number of datasets on /// are setup that way and it generally works great! I hope that eventually we will come back to our discussion on create-sibling (#925) and now found an earlier sibling of this issue (#921).