bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
185 stars 111 forks source link

Why are "physio.tsv.gz" and its JSON currently not valid for eeg/meg/ieeg ? #990

Closed sappelhoff closed 4 years ago

sappelhoff commented 4 years ago

while going though _physio and _stim files (see https://github.com/bids-standard/bids-specification/pull/513), I realized that the validator currently does not accept files like sub-05_task-matchingpennies_recording-eyetracking_physio.tsv.gz:

├── sub-05
│   └── eeg
│       ├── sub-05_task-matchingpennies_channels.tsv
│       ├── sub-05_task-matchingpennies_eeg.eeg
│       ├── sub-05_task-matchingpennies_eeg.vhdr
│       ├── sub-05_task-matchingpennies_eeg.vmrk
│       ├── sub-05_task-matchingpennies_events.tsv
│       └── sub-05_task-matchingpennies_recording-eyetracking_physio.tsv.gz

This is because the regular expressions for:

are specified for func, but not for eeg, meg, and ieeg.

https://github.com/bids-standard/bids-validator/blob/6f123f091b0951fe3cfcc670fbbb27f7fd1fb449/bids-validator/bids_validator/rules/file_level_rules.json#L119-L145

I think this should be changed, and it is valid BIDS to have a sub-05_task-matchingpennies_recording-eyetracking_physio.tsv.gz in an EEG dataset at the nesting specified in my example above.

@robertoostenveld, what is your opinion? Have you stumbled over this issue before?

effigies commented 4 years ago

Agreed. I would move these out of the func regex and into a physiostim (or similar) regex that can accept multiple data types. I would add beh, as well, for that matter.

robertoostenveld commented 4 years ago

hmm, interesting consideration. In preparing the BEPs for MEG, EEG and iEEG we have not discussed adding the physio and stim files. Although for iEEG there was a discussion on electrical stimulation, but that was non-continuous and ended up as https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/04-intracranial-electroencephalography.html#electrical-stimulation

Starting from the MEG side: these systems usually have a lot of AUX channels (that are sampled synchronously and stored in the main MEG dataset) and we decided to keep them in there when present. Hence the type in the channels.tsv. We also said that if EEG is recorded with another system, it is to be stored as separate EEG modality (which we knew would follow soon as a BEP). But at that time I don't recall that we considered anything besides EEG.

The recording-eyetracking_physio.tsv that you outline above would make sense. However, I would rather have that represented on its own (with proper metadata) as per https://bids.neuroimaging.io/bep020 (eye tracking BEP). But then, that is not ready yet and not clear when it will.

Storing auxiliary data in a TSV file next to the main data (given the modality in the directory name) is a strategy that makes sense to me. However, eye tracking is not physiology; I would rather see it as behaviour. In this example there was physio (EMG, PPG and breathing) from a single brainamp and eyetracking from a SMI system. In that case I used the modality labels _emg.tsv and _eyetracking.tsv.

In the PET BEP (almost final) there is a _blood.tsv file that complements the nii PET files. Actually, there are two, since two separate methods are common to draw and analyze record blood (an autosampler and manually).

There might be a general pattern appearing here, which would be that the main data can be complemented with aux data in a relatively minimal _stim.tsv, _physio.tsv, '_blood.tsv` or another tsv file which is stored beside the main data. That is fine if that data can be used without too much metadata. If that data is to be described with more detail (e.g. for ECG I am sure that a cardiologist can come up with a lot of metadata, whereas I personally would just say "two electrodes somewhere on the chest", idem for eye tracking, or motion capture, etc), then that data should be stored according to its own extension and BIDS modality. And as long as the extension/modality does not yet exist, or if the data producer does not know much about the acquisition (e.g. ECG or EMG electrodes are often rather randomly placed by fMRI or MEG researchers due to lack of knowledge in those domains), it could go in the minimal tsv file.

It would be good if you guys could have a look at the PET BEP and specifically on the synchronization between the _blood.tsv and the other (in this case PET) data. Also for a _physio.tsv or _eyetracker.tsv file next to the EEG data there should be some explicit specification how the two are synchronized.

sappelhoff commented 4 years ago

Thanks Robert! I generally agree with you and the things you say in your last paragraph.

For some things you say however, I have a slightly different take - let me address them point by point:


however, eye tracking is not physiology; I would rather see it as behaviour

I don't have a strong opinion about what type of data eyetracking is, as long as I can clearly see that it's eyetracking, which I think is quite clear from:

│       ├── sub-05_task-matchingpennies_eeg.eeg
│       └── sub-05_task-matchingpennies_recording-eyetracking_physio.tsv.gz

And currently, BIDS does support eyetracking in _physio files, see: physio section in BIDS

Physiological recordings (including eye tracking) should use the _physio suffix


In preparing the BEPs for MEG, EEG and iEEG we have not discussed adding the physio and stim files

I think that physio+stim do not need to be explicitly added --> in my understanding they are already as much part of BIDS as beh or func ... just that they do not have their own dedicated "modality directory", and instead are intended to "ride" closely with their associated file

This is also why I see my reported issue as a bug --> I see no reason why physio+stim files should be fine in a func/ or a beh/ dir, but not in an eeg/ dir.


The recording-eyetracking_physio.tsv that you outline above would make sense. However, I would rather have that represented on its own (with proper metadata) as per https://bids.neuroimaging.io/bep020 (eye tracking BEP). But then, that is not ready yet and not clear when it will.

I would also prefer to go with BEP020, but using _physio.tsv.gz + _physio.json allows me to at least share something in a way that's already supported. --> that is ... I could do this once the presently reported bug is resolved :-)


There might be a general pattern appearing here, which would be that the main data can be complemented with aux data in a relatively minimal _stim.tsv, _physio.tsv, '_blood.tsv` or another tsv file which is stored beside the main data. That is fine if that data can be used without too much metadata [emphasis added]

I agree --> and my intention is not to blow up physio to replace e.g., the BEP020 ... rather my sole intention is to practically work with what we have and make sure to fix bugs (in validator and spec)

rwblair commented 4 years ago

1001 expands physio and stim files to eeg ieg meeg. The specification in 06-physiological-and-other-continuous-recordings.html is very explicit about func, so that should either be updated to include *egs or made ambiguous enough to cover them.

There are some key/value regex that only appear in functional file names (rec, dir, ce...) or in the *eg in general (split) or in ieeg (space). Would these keys ever appear in physio or stim file names?

sappelhoff commented 4 years ago

The specification in 06-physiological-and-other-continuous-recordings.html is very explicit about func

are you referring to the fact that the examples have func in them? Or that there is a reference to .nii.gz? I couldn't find any other reference that physio+stim is only intended for MRI/func.

These two examples are because when merging MEG,EEG,iEEG into BIDS, we never made the effort to adjust the Physio section in BIDS, so it is still pretending that BIDS is only about MRI :-)

I am trying to address this here: https://github.com/bids-standard/bids-specification/pull/513

Would these keys (rec, dir, ce) ever appear in physio or stim file names?

as far as I can see from the spec, physio or stim files would contain every entity that the file contains the refer to ... PLUS an optional recording entity.

see this part in the physio section:

Where corresponds to task file name without the _bold.nii[.gz] suffix. For example: sub-control01_task-nback_run-1

robertoostenveld commented 4 years ago

could _physio.tsv and _stim.tsv also exist on their own, like for _events.tsv?

sappelhoff commented 4 years ago

could _physio.tsv and _stim.tsv also exist on their own, like for _events.tsv?

reading the behavioral sction makes me think so :thinking: what do you think?

robertoostenveld commented 4 years ago

I don't think it is neither explicitly allowed nor explicitly forbidden. For https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/07-behavioral-experiments.html it at least says "Behavioral experiments (with no MRI)" and there is a dedicated beh folder. But for physio and stim the example only mentions func (i.e. also no eeg as you already figured out).

Imagine me recording PPG and breathing during a behavioural experiment. All elements by themselves could be stored in BIDS, but in this specific situation it is not clear from the specification how. I would do it like this:

bids
├── README
├── dataset_description.json
└── sub-01
    └── ses-01
        ├── beh
        │   ├── sub-01_ses-01_events.tsv
        │   ├── sub-01_ses-01_physio.json
        │   └── sub-01_ses-01_physio.tsv
        └── sub-01_ses-01_scans.tsv
robertoostenveld commented 4 years ago

Another question: is it required that these tsv files are gzipped? For MRI data the .gz extension is optional, see here. Why would some tsv files be gzipped and others not (like events)?

I think it makes sense to make that consistent, i.e. make it optional for all tsv files to be gzipped (or not).

robertoostenveld commented 4 years ago

1001 expands physio and stim files to eeg ieg meeg.

I think we should make it more future proof and already anticipate upcoming modalities like PET. So rather than extending the includelist from [func] to [func, eeg, meg, ieeg], I propose to extend it to all modalities. That would also cover the beh example I just gave. At the Donders we have researchers that record physio during anat and dwi scans, to use as confound in the analysis (i.e. for patients that tend to move).

sappelhoff commented 4 years ago

I would do it like this:

yes, I would do it like that as well. We could improve coverage of the example in the physio section. But probably after these two are merged:

Another question: is it required that these tsv files are gzipped?

good point and it has recently been raised by @yarikoptic in https://github.com/bids-standard/bids-specification/issues/472

That issue could use some more opinions to converge on a solution.

I propose to extend it to all modalities

I agree cc @rwblair ... if we don't find an elegant fix for this, it'd just mean adding 4 lines to each of the "regexp blocks" (stim.tsv, stim.json, physio.tsv, physio.json)

That would also cover the beh example I just gave

beh is already covered: https://github.com/bids-standard/bids-validator/blob/6f123f091b0951fe3cfcc670fbbb27f7fd1fb449/bids-validator/bids_validator/rules/file_level_rules.json#L52-L64