Open Remi-Gau opened 1 year ago
If this type of data duplication is to be disallowed, it may be a good thing to:
For example, the following rendering may suggest that all 3 files can co-exist in the same dataset
maybe better to have something like:
sub-<label>[_ses-<label>][_acq-<label>]_photo.[tif|png|jpg]
Okay, here's a proposal:
photo:
suffixes:
- photo
extensions:
- [.jpg, .png, .tif]
datatypes:
- eeg
- ieeg
- meg
- nirs
entities:
subject: required
session: optional
acquisition: optional
photo__micr:
$ref: rules.files.raw.photo.photo
extensions:
- [.jpg, .png, .tif]
- .json
datatypes:
- micr
entities:
$ref: rules.files.raw.photo.photo.entities
sample: required
Here, the extensions that are in a list together are "the same kind" and so mutually exclusive and distinguishable from supplementary entries, such as .json
.
For NIfTI, we would do - [.nii, .nii.gz]
.
BUT...
For EEG:
eeg:
suffixes:
- eeg
extensions:
- .json
- .edf
- .vhdr
- .vmrk
- .eeg
- .set
- .fdt
- .bdf
datatypes:
- eeg
entities:
subject: required
session: optional
task: required
acquisition: optional
run: optional
I think we could do something like:
extensions:
- .json
- [ .edf, .eeg, .set, .bdf ]
- .vhdr
- .vmrk
- .fdt
And then just use a couple checks to say that if any of .eeg
, .vhdr
or .vmrk
exist, then they all exist. And if .fdt
exists, then .set
exists.
👍
and for:
For file formats that are based on several files of different extensions, or a directory of files with different extensions (multi-file file formats), only that file SHOULD be listed that would also be passed to analysis software for reading the data. For example for BrainVision data (.vhdr, .vmrk, .eeg), only the .vhdr SHOULD be listed; for EEGLAB data (.set, .fdt), only the .set file SHOULD be listed; and for CTF data (.ds), the whole .ds directory SHOULD be listed, and not the individual files in that directory.
(see: https://bids-specification.readthedocs.io/en/latest/modality-agnostic-files.html#scans-file)
Describe your problem in detail.
Note that this issue may apply to more datatype in BIDS but I have not checked it systematically.
As far I can tell it is not mentioned in the specification that files cannot differ just by their extension.
For example, modifying the micr_SEM bids example to have 2 times the same data that differ only by extension:
From my current reading of the spec, this could be valid.
And also the bids validator does not complain about this: except from sayaing that not all subject have the same number of files.
I have mostly checked with picture files
*_photo.*
(eeg, meg, micr) but it also seems to be the case for eeg files:Am I missing something but maybe this type of potential data duplication should be disallowed?
Describe what you expected.
I would expect an error like for example in the case of
.nii
and.nii.gz
where the validator throws this error:BIDS specification section
No response