Closed yarikoptic closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 87.79%. Comparing base (
4c642bd
) to head (1c029c6
). Report is 53 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
What the example looks like
What the example looks like
anal me notices now that dataset_description.json
in that example is "inconsistent" it its ordering with directories...
I will at least unify for that now in a separate commit (easy to drop if we decide not worth it)
note: failed to upload coverage, I restarted that pipeline to get it hopefully green.
yes, it is all green. Waiting for more blessings. @effigies @tsalo -- WDYT?
This feels more confusing to me.
└─ my_dataset-1/
├─ sourcedata/
│ ├─ dicoms/
│ ├─ raw/
│ │ ├─ sub-01/
│ │ ├─ sub-02/
│ │ ├─ ...
│ │ └─ dataset_description.json
│ └─ ...
├─ derivatives/
│ ├─ pipeline_1/
│ ├─ pipeline_2/
│ └─ ...
├─ dataset_description.json
└─ ...
Why is there now a dataset_description.json
? What are the contents of this outer dataset apart from sourcedata/
and derivatives/
?
I still don't understand why what is present is confusing, but maybe it would be better just to remove it. I would like to move to a Diataxis approach, where we clearly separate out the helpful hints and how-tos from the reference material. Maybe it could be revived in a how-to guide in a way that causes fewer problems, if there's any value to it.
└─ my_dataset-1/ ... Why is there now a
dataset_description.json
? What are the contents of this outer dataset apart fromsourcedata/
andderivatives/
?
IMHO in BIDS specification we SHOULD only talk about BIDS datasets. We SHOULD NOT talk about arbitrary conventions which could exist outside of BIDS specification. Hence IMHO this example SHOULD be a BIDS dataset as well -- AFAIK nothing in BIDS specification would state that it is an illegitimate BIDS dataset, or am I wrong?
I still don't understand why what is present is confusing, but maybe it would be better just to remove it. I would like to move to a Diataxis approach, where we clearly separate out the helpful hints and how-tos from the reference material. Maybe it could be revived in a how-to guide in a way that causes fewer problems, if there's any value to it.
IMHO it is useful to give users guidance. My problem with current formulation is that it is misleading. This PR IMHO clarifies it, unless it is indeed "not BIDS compliant".
Hence IMHO this example SHOULD be a BIDS dataset as well -- AFAIK nothing in BIDS specification would state that it is an illegitimate BIDS dataset, or am I wrong?
I believe the validator would currently complain that there are no validatable files. You can argue that's wrong, but from an OpenNeuro perspective, I would want to reject such a dataset as it's effectively a blanket .bidsignore
. But we would also reject what you're replacing, since it wasn't a BIDS dataset, but a layout demonstrating one way to include BIDS datasets alongside their sources and derivatives.
IMHO it is useful to give users guidance. My problem with current formulation is that it is misleading. This PR IMHO clarifies it, unless it is indeed "not BIDS compliant".
I find it unhelpful at best, as it drops everything but a dataset_description.json
and two opaque directories that could potentially hold nested datasets. This is why I think that a how-to guide would be better, where it discusses these options in detail, and not normatively.
there are no validatable files
wrong. There is dataset_description.json
. Might be a bug in bids-validator then ;-)
You can argue that's wrong, but from an OpenNeuro perspective, I would want to reject such a dataset as it's effectively a blanket
.bidsignore
.
as we discussed -- this sounds like an archive specific desire/behavior.
But we would also reject what you're replacing, since it wasn't a BIDS dataset, but a layout demonstrating one way to include BIDS datasets alongside their sources and derivatives.
ok, let's meet half-way: what if I remove dataset_specification.json
for now , and mostly revert that added statement that this is a valid bids dataset? and then propose separate follow up PR on that particular aspect since I think we would want also some explicit DatasetType = "study"
or alike (e.g. "project" to possibly avoid conflict with BEP035) for such cases.
Then we stay with "convention"
Sure, if we drop dataset_description.json
and ...
and do not say or imply that the outer dataset is a BIDS dataset, then this is basically the same as before. I suppose that's fine.
To expedite and ease re-reviewing -- I committed that final form of suggestions .
This is what the example now looks like:
I personally find it confusing that the BIDS raw data is now listed under sourcedata, because the way "people" usually think about the directories is:
nesting "2" under "1" is technically possible, but I find it unhelpful/weird.
Having that said, @Remi-Gau and @effigies seem to have discussed this at length / thought about it, so I will defer to them.
there is no 'ideal' setup per se, but if datalad used, then 2. (sourcedata/raw
) could also have (possibly subset of only relevant ones) of sourcedata/
(like sourcedata/dicoms
) be installed into its own sourcedata/raw/sourcedata
. The idea here is just to make it all "flat" and separated only to sourcedata/
(bids or not) and derivatives/
(what computed from those).
@effigies @sappelhoff so are we go on this one or more tune ups?
Individual commits have more rationale. I can split into multiple PRs but let's see - may be this could provide us a closure. Reflecting my thoughts on