Closed yarikoptic closed 2 months ago
I see: You want sourcedata/
to be actual pre-BIDS and rawdata/
to be a BIDS dataset inside a BIDS-derivative dataset.
This feels like an unnecessary complication. sourcedata/
and derivatives/
are always relative to the current dataset and mark a boundary. That feels like enough, and specifying some directories that serve the same purpose for different dataset types does not sound fun to try to express in schema and implement in a validator.
Also, in practice, many derivatives may have multiple "sources", such as the raw dataset and some preprocessing, e.g.,
ds000001-fmriprep/ # BIDS Derivative
sourcedata/
ds000001/ # BIDS
freesurfer/ # Neither BIDS nor BIDS derivative, but still an input to my derivative
If we change sourcedata/ds000001
to rawdata/
what should we do with sourcedata/freesurfer
?
I see: You want
sourcedata/
to be actual pre-BIDS andrawdata/
to be a BIDS dataset inside a BIDS-derivative dataset
not really -- I am just trying to ensure that schema matches the text we have. We should fix one or another. If not in BIDS 1.0, we should likely remove rawdata/
in BIDS 2.0.
I am with you generally that sourcedata/
should be enough and preferable for the reasons you stated. FWIW
rawdata/
in spec text was added as long ago as of ebe69efd30a83f17970a3638fc23387d28402e5a . rawdata/
;) https://github.com/OpenNeuroDatasets/ds003563 So you want to remove text saying that you can place a raw BIDS dataset next to its source and derivatives, instead of nested? I don't understand why this example is a problem.
I want standard to not use in examples arbitrary folder names on the top level which are not described by the standard. So I think we either
rawdata/
(in bids 2.0? 1.0?)Otherwise that example is a "bad example" to be in a standard. We should not give examples with arbitrary folders being populated with anything, unless we allow for that (e.g. as subfolders of code/
, sourcedata/
etc) .
I haven't looked much into subdirectories, since I think it's really bad to have them nested, and always try to keep them separate. Why not just have independent datasets which reference their parents in the dataset_description.json
file? It seems so much simpler, and would also solve the many-to-many correspondence between raw and derivatives. That would also be coherent with not constraining names for top-level directories, which I also think is a good idea.
@TheChymera thanks, but could you clarify on either it is specific to rawdata/
there or could be another path in the test?
@yarikoptic not sure I understand the question, but are you asking whether that's a random string for which “rawdata” just happens to be the value? If so, yes:
edit: @yarikoptic took liberty to wrap long example/diff into <details></details>
If so, yes:
"yes" as "it was a random choice and could be anything else", correct?
Then we should also fix the test if we decide that rawdata
is part of the specification/legit.
Inspired by
with some relevant prior discussion found in
apparently only the text talks about
rawdata/
whilesrc/schema
does not have references beyond mentioning in the tests (@TheChymera could you clarify on either it is specific torawdata/
there or could be another path):IMHO we should fix
src/schema
to "specify/allow" forrawdata/
in BIDS derivative datasets (not in raw)