Closed jsheunis closed 2 years ago
- only extracts metadata from content that does not require
get
ting annexed data
you are killing my dream of using glorious datalad-fuse! ;-) and how do you know that those files would not be annexed?
Merging #104 (9cd3a3b) into master (caec500) will increase coverage by
0.69%
. The diff coverage is90.44%
.:exclamation: Current head 9cd3a3b differs from pull request most recent head 73cc22f. Consider uploading reports for the commit 73cc22f to get more accurate results
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
+ Coverage 85.12% 85.82% +0.69%
==========================================
Files 21 23 +2
Lines 1049 1206 +157
==========================================
+ Hits 893 1035 +142
- Misses 156 171 +15
Impacted Files | Coverage Δ | |
---|---|---|
datalad_neuroimaging/extractors/bids_dataset.py | 88.00% <88.00%> (ø) |
|
...neuroimaging/extractors/tests/test_bids_dataset.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update caec500...73cc22f. Read the comment docs.
you are killing my dream of using glorious datalad-fuse! ;-) and how do you know that those files would not be annexed?
good point. I don't know that they won't be annexed.
Something that I think is important with the use of metalad is to make a distinction between dataset and file level extractors. With this update focusing on the dataset level, does it perhaps make sense to require a specific set of BIDS-compatible files that would be necessary for extraction of dataset-level metadata (e.g. participants.tsv
, any READMEs, dataset_description.json
) and ignore the rest?
I realise that there could be any number of files in the BIDS dataset from which to extract file-level metadata but which, if combined/aggregated/derived, could turn into metadata that is interpretable on the dataset-level. But the amount or combination of such files is arbitrary and not easily definable on the dataset-level. And I think even without that information, the currently proposed dataset-level extractor could still extract useful information.
TODO for @jsheunis:
dataset_description.json
file is available locally, if not get
it.you are killing my dream of using glorious datalad-fuse! ;-) and how do you know that those files would not be annexed?
good point. I don't know that they won't be annexed.
Something that I think is important with the use of metalad is to make a distinction between dataset and file level extractors. With this update focusing on the dataset level, does it perhaps make sense to require a specific set of BIDS-compatible files that would be necessary for extraction of dataset-level metadata (e.g.
participants.tsv
, any READMEs,dataset_description.json
) and ignore the rest?I realise that there could be any number of files in the BIDS dataset from which to extract file-level metadata but which, if combined/aggregated/derived, could turn into metadata that is interpretable on the dataset-level. But the amount or combination of such files is arbitrary and not easily definable on the dataset-level. And I think even without that information, the currently proposed dataset-level extractor could still extract useful information.
The challenge of whether required files are annexed or not is addressed by correctly specifying the get_required_content
functionality, as per https://github.com/datalad/datalad-neuroimaging/pull/104/commits/f5655da754c86dee299e3324f68810019b541d7f
Ok, this PR has been open for a long time and I want to merge it. I'm going to finish whatever is remaining and easy to complete, and will then merge unless there is strong disagreement @datalad/developers
- only extracts metadata from content that does not require
get
ting annexed datayou are killing my dream of using glorious datalad-fuse! ;-) and how do you know that those files would not be annexed?
This initial statement is not correct anymore. Any annexed content that is necessary for the extraction process will be fetched before extraction starts via metalad's get_required_content
functionality, or where applicable on a case by case basis (e.g. readme content).
Remaining issue is the failing test, will sort that now.
Remaining test failure seems to be related to some dataset not containing gen4 metadata (probably outdated testdata?, or could be related to the metalad version bump?), while test_bids_dataset.py
now succeeds. Will create a separate issue for remaining failure.
This PR is in response to #94. It adds a dataset-level extractor for BIDS datasets, called
bids_dataset
that:datalad-metalad
(and introduces this dependency,datalad-metalad>=0.3.1
)dataset_description.json
andparticipants.tsv
are available locally before proceeding with the extraction processget
where applicable)pybids>=0.15.1
andBIDS v1.6.0
bids
extractor in any waybids
extractor), including information about subjects, sessions, runs, tasks, entities, and variables.Old PR comment:
This WIP PR intended to address #94 introduces a new extractor
bids_dataset
that:datalad-metalad
functionalityget
ting annexed databids.py
extractorMain changes:
bids_dataset.py
extractortest_bids_dataset.py
testsetup.cfg
setup.cfg