Open satra opened 3 years ago
I think this functionality should as much as possible align/reuse with https://github.com/datalad/datalad-neuroimaging/blob/master/datalad_neuroimaging/extractors/bids.py which ATM is just a dump of metadata as provided by pybids.
But IIRC @mih mentioned that in the scope of ebrains openminds he is consider (or just advising?) to provide more "tight" harmonization. @mih could you briefly chime in on the plans on that end here? (or just add references)
alignment is good, but we will want to fill in the fields of our asset metadata structure as well about participants and biosamples.
What I was talking about in that meeting was that a bids2openminds conversion is taking place outside the scope of a metadata extractor. An extractor should report "as-is". If the metadata source (like BIDS), it not "semantically clean", a subsequent (and updatable) transformation can be used to yield a "better" (or just different) record.
I realized at some point that doing the standardization at the level of an extractor implies that any application of updates to that standardization requires actual data access, and also makes metadata extraction an inherently open-ended process. Adding the possibility to for customizable transformations of metadata seems much more practical, when data access is complicated (which it seems to be for most datasets).
@yarikoptic - perhaps we can add some bids support in the short term with respect to participant id and a few other things.
@mih - in our case metadata extraction is performed at the point of validation/upload so access is there. in the future we may want to extend the schema, for which we would indeed need to pull in the directory structure (especially for bids, where the inheritance principle does apply for some metadata).
yeah, I guess we shouldn't postpone for too long. I do not think we should at this point anyhow to amalgam data + sidecar files into a single asset, so we will keep it KISS and have an asset per each file, be it a data, sidecar, or metadata. dataset_description.json
will also be a first-class-citizen and have an asset.
Do you know perspective datasets which would be uploaded and should be BIDS?
as we deal with non-nwb dandisets, would be good to add bids metadata extraction, which may require parsing the tree (to get age, sex from participants.tsv, etc.,.).