Open yarikoptic opened 4 years ago
Will need to check. But compression should be the default. Moreover, I'd like to reduce the format difference between dataset vs. file metadata to be as small as possible. Both should be JSON lines. At the moment there can only be a single records at the dataset level, but that is an artificial limitation without a real gain AFAICS now.
But compression should be the default.
the idea for ds-
not been compressed was: they should typically be smallish, available with a straight clone of the dataset, compressed by git. Also changes monitored directly by git, that is where
Both should be JSON lines
might impair since then I am not sure how well diff could encompass a few words difference in long lines. But all metadata does go under annex by default now (datalad.metadata.create-aggregate-annex-limit
config variable) as of 78e32a4517befc72d7a2743e016ca944c264c95e (0.11.2~10^2) , but that is configurable. Just something to keep in mind (that could be tuned by user, leading to blown up .git/objects for those diffs)
the idea for
ds-
not been compressed was: they should typically be smallish, available with a straight clone of the dataset, compressed by git. Also changes monitored directly by git, that is where
Yes, that was the idea. However, it rarely works out, because then extra-care has to be exercised that no sensitive information ever ever leaks into this file -- something that is impossible to guarantee.
Use case - HCP: https://github.com/datalad-datasets/human-connectome-project-openaccess/issues/7, where majority of metadata comes from
ds-
files which in current datalad metadata handling way aren't compressed. May be we could also use.gitattributes
to assign configuration per file(s) pattern on what to compress or not. Not sure if metalad's approach to them is different, so may be this issue is not pertinent, wanted to ask.