Open yarikoptic opened 5 years ago
https://github.com/datalad/datalad-revolution/pull/84 now yields this for a complete dataset, and each file result also has size info.
...
"datalad_core": {
"@id": "d97455f592d5ad610efc3701d1479aff3513452b",
"authors": [
...
"Michael Hanke <michael.hanke@gmail.com>",
....
],
"contentbytesize": 14511881778,
"dateCreated": "2015-10-01T13:37:24+02:00",
"dateModified": "2018-05-11T09:23:33+02:00",
"distribution": [
{
"name": "mddatasrc",
"url": "http://psydata.ovgu.de/studyforrest/phase2/.git"
},
{
"name": "origin",
"url": "https://github.com/psychoinformatics-de/studyforrest-data-phase2"
}
],
"hasPart": [
{
"name": "src/lab-eyetracking",
"type": "Dataset"
}
],
"identifier": "5eaff716-54eb-11e8-803d-a0369f7c647e",
"version": "0-75-ge9f5a08"
},
...
Could be quite useful to know how big the dataset is without installing it.
We could easily include output of the
git annex info
call, such as (removing remotes here):Most frequently of interest is the size of the dataset and all of its subdatasts, so we should aggregate that information, either during metadata aggregation, or "dynamically" somehow since all that information about data sizes in the subdatasets would be available
This could be of relevance to https://github.com/datalad/datalad/issues/2403 . ATM web ui does similar sizes extraction, but does it also per each file/directory. If we maintain size information also per each file (for annexed - typically could be extracted from key which we already carry; for git - we would need it anyways) so it could be used to estimate also directories sizes "on the fly" (unless we eventually start providing metadata at directory level, which is feasible in many cases, such as subject info in BIDS per sub- directory)