Open yarikoptic opened 1 year ago
In like of discussion on Atlases BEP I think we should provide some overall formalization behind BIDS files/structure, which could sound like
overall organizational principle which could describe "how BIDS file hierararchy is built", we might be able at some point to state something like
entities/
(a plural version of the entity
) to make that entity the leading entity to distinguish groups of filesent-<label>/
(ent
as an abbreviated version of entity
) could be provisioned to further group data for the same ent-<label>/
.entities.tsv
with a corresponding entities.json
to describe columns in the .tsv
is recommended to be provided
participants.tsv
-> subjects.tsv
for unification and despite suboptimal connotation. entity2
, could be chosen for the next level of groupping under the first level entity
. In such case the leaf filenames would acquire ent-<label>_ent2-<label>_
prefixes.Such principle already lays down well for our sub-/ses-
hierarchy and having participants.tsv
for sub-
and sessions.tsv
for _ses-
, for _desc
we have descriptions.tsv
, so overall "backward compatible" (but see #55 which breaks it on two aspects: no ent- prefix and no _ent- portion in the filename prefix).
Is there an example of a use case where this would be relevant? @yarikoptic BIDS 2.0 is meant to be more user-friendly. Complexifying an already complex scheme will not make BIDS 2.0 more user-friendly.
Also, about the logic with prefixes, entities, simmetries between all entities, etc. While it makes a lot of sense to computer scientists, it is completely lost on the common mortal.
Examples are linked in the original description. Added one more for #59 .
Any lack of "symmetry" actually hurts mortals (e.g. classical "why is it sub-
, but then participants.tsv
?" although it is a separate issue #14 but most representative of consistency/symmetry here).
I second @arnodelorme here. I'm dealing with BIDS datasets for several years now while building a repository. I can't understand the motivations and implementations of this issue. Do you expect people and tools to understand a flexible layout like the one explained here? In my opinion, this adds an unnecessary level of indirection. Suddenly, all the tools we're using will have to interpret some directory layout specification. And once flexibility is allowed, you can expect everyone to use it, possibly leading to a different layout for each dataset.
It's complicated enough to deal with optional sessions while implementing a tool. We decided internally, and users didn't object, to make sessions mandatory just not to deal with handling that.
The strength of BIDS is its specificity, fixed directory structure, and file naming convention. I wouldn't go away from that.
@arnodelorme and @mateuszpawlik thank you very much for chiming in! I would be happy to explain more on my motivation beyond use cases I keep populating in the original description. But may be we could discuss them "interactively"? Are you planing to attend upcoming INCF in Austin, TX or SfN in Chicago, IL? If not -- we could zoom.
Quick summary answer to @mateuszpawlik : one of the original motivations is that BIDS already covers more than just 'neuroimaging' data (e.g. microscopy) and even more modalities would become supported as time goes. Not all of them have subject
as the level of differentiation most appropriate at the highest level. Could be as large as a "study" or as little as a "slice" (see OP for references). Talking about people, when you come to a new BIDS dataset and see that on first level you have sample-1/
, sample-2/
and so on, you would immediately understand (without even looking anywhere) that it is about different samples
(it is a standard BIDS entity). And I do acknowledge that for tools it would indeed require some development to support the specification, instead of hardcoding fixed assumption of the hierarchy. But I also hope that common libraries like schemabidstools and pybids
could assist in making such transitions easier, while empowering those tools to support a much wider range of cases.
One example use case is that if bids 2 were to include the ability to specify layout and metadata location/binding rules at a meta level, then it would be possible to express those rules for other standards (e.g. SDS). This would allow formats that diverged from bids 1 due to its limitations to reconverge on bids 2.
Thank you @tgbugs for the feedback/support. Please :+1: this issue ;) Do you think you could compile a list of possible steps to converge SDS to BIDS, e.g. like I did for DANDI ?
I will take shot at a list of possible steps though will likely only get to it after Neuroinformatics and SfN.
Origin: Originally summarized/presented in https://github.com/bids-standard/bids-specification/issues/751#issuecomment-820800800 (not duplicating here for now) while discussing a possible "stimuli BEP" and where it boiled down to having some
stim-{label}/
folders structure either at top level or understimuli/
, which is currently no defining any structure to use there. Current state: many usecases collected (see e.g. below), design being formalized inOther relevant issues in this bids-2-devel or elsewhere I found which would be partially or fully addressed with such enhancement
https://github.com/bids-standard/bids-2-devel/issues/11 : add
/site-<site_label>
level in favor of encoding it within/ses-{label}
https://github.com/dandi/dandi-cli/issues/1302 - in DANDI we support a lightweight "BIDS-inspired" layout (while BEP032 is still being worked on) which has no
/ses-{label}
subfolder since makes little sense since lots of sessions and 1 file per session with possibly already a long file name due to longsub
andses
labels.https://bids.neuroimaging.io/bep038 - Atlases BEP... IMHO could have
atlas-<label>/
top level structure for the entityatlas
atlases/
" description as well. So we might have smth like{'.': ["subject", "[session]", "
datatype"], 'atlases': ["atlas"]}
to describe that on top level we separate atsubject
level and underatlases/
-- at "atlas", but for a dataset which is purely an "atlas" dataset, it could be{'.': ["atlas"]}
https://bids.neuroimaging.io/bep035 - MEGA (Modular extensions for individual participant data mega-analyses) BEP. Proposes
study-
entity at the top level andstudies.tsv
to summarize.would provide a solution for #59
example (prototype since we have not boiled down syntax)
top level `dataset_description.json` could have "default" one ```json "DatasetLayout": { "." : [{ "entity": "subject", "folder": true }, { "entity": "session", "folder": true }] } ``` whenever nested BIDS dataset at `sub-XXX/ses-YYY/` level have ```json "DatasetLayout": { "." : [{ "entity": "subject", "folder": false }, { "entity": "session", "folder": false }] } ``` thus signaling that `sub-XXX_ses-YYY_` should still be within the target filename as a prefix but no leading directories should be there.in the scope of stimuli BEP (XXX, google doc), to accommodate large stimuli databases, such as https://cocodataset.org/ with 330K images, it would require some groupping. But we would need to figure out how to group in general -- would require more entities than just
stim-
some heavy datasets might want even more entities to be used. E.g. in https://dandiarchive.org/dandiset/000026, there are thousands files for about 50 different
_sample-
s under sub-I38/ses-SPIM/micr so it would have been logical to addsample-<label>/
level andsamples.tsv
to describe them, make it smth like