bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
180 stars 109 forks source link

_electrodes.tsv required for iEEG #1771

Open dorahermes opened 1 year ago

dorahermes commented 1 year ago

It seems from the ieeg-BIDS documentation that the _electrodes.tsv file is required for ieeg data.

However, there are now several pretty impressive iEEG datasets on OpenNeuro dataset without electrode positions but no errors, and another dataset with electrode positions in a .mat rather than a .tsv file without errors. Is this perhaps a validator issue?

sappelhoff commented 12 months ago

here, @effigies states that electrodes.tsv is actually optional for iEEG:

though I don't know where that information is coming from.

Where do you take your information from @dorahermes? Is it the fact that the file names in the "file name templates" are not in square brackets?

https://bids-specification.readthedocs.io/en/latest/modality-specific-files/intracranial-electroencephalography.html#electrode-description-_electrodestsv

For EEG it definitely makes sense to have this file as OPTIONAL, because sometimes there is just no information on electrode positions. Should this file really be REQUIRED for iEEG?

And do we perhaps need to change something in the schema, if electrodes.tsv shows up as required even for EEG?

dorahermes commented 12 months ago

Yes, I was assuming it was REQUIRED given the lack of square brackets in the specification.

Since the specification more clearly states that even the _channels.tsv file is RECOMMENDED it may make more sense that the _electrodes.tsv file is RECOMMENDED as well for iEEG data. This would also be more backwards compatible with these OpenNeuro data, but would at least create a warning.

The problem is that, in contrast with EEG data where channel positions have standardized names, iEEG data often do not make any sense if electrode positions are not shared (it is like sharing voxel timeseries from different brain regions that are randomly shuffled).

effigies commented 12 months ago

here, @effigies states that electrodes.tsv is actually optional for iEEG:

* [electrodes.tsv for simultaneous MEG/EEG data bids-specification#1550 (comment)](https://github.com/bids-standard/bids-specification/issues/1550#issuecomment-1666051107)

though I don't know where that information is coming from.

It had already been a valid regex in the validator before the MEG discussion, and nothing in the spec says it is required. Likewise,

If an *_electrodes.tsv file is specified, a *_coordsystem.json file MUST be specified as well.

seems to clearly indicate that it is optional for EEG. I'm not sure why I thought it was required for EEG at the time of the linked comment.

effigies commented 12 months ago

The problem is that, in contrast with EEG data where channel positions have standardized names, iEEG data often do not make any sense if electrode positions are not shared

From https://github.com/bids-standard/bids-specification/issues/1550#issuecomment-1686071348, because some systems record electrode location directly into the data files, some researchers would prefer to avoid duplicating the electrode information and potentially having a metadata synchronization problem.

dorahermes commented 12 months ago

That is never the case for iEEG, only MEG.

dorahermes commented 12 months ago

Please also note that _electrodes.tsv was required in the original iEEG-BIDS document

effigies commented 12 months ago

I can't access that. Sent a request, if you have admin. In any event, if it should have been required, we should open a PR to clearly state that it is a required file in the spec and implement a validator check.

effigies commented 11 months ago

FWIW, here are the datasets with iEEG directories with no electrodes.tsv:

I agree that people using non-BIDS files to provide electrode information is bad. Are there any that are doing things properly but using data files that include electrode position information? If not, that would be a strong argument for moving to REQUIRED.

In any event, I'm +1 for making this at least RECOMMENDED.

sappelhoff commented 11 months ago

+1 for making it at least RECOMMENDED as well (for all, EEG, iEEG, MEG). I would be okay with making it REQUIRED for iEEG (depends on what @dorahermes thinks is a good way forward).


FWIW, here are the datasets with iEEG directories with no electrodes.tsv:

One of these dataset is by @adam2392 --> https://openneuro.org/datasets/ds003029/versions/1.0.6

Why don't you share electrodes.tsv files, Adam?

adam2392 commented 11 months ago

FWIW, here are the datasets with iEEG directories with no electrodes.tsv:

One of these dataset is by @adam2392 --> https://openneuro.org/datasets/ds003029/versions/1.0.6

Why don't you share electrodes.tsv files, Adam?

No coordinates :/. So it would just be an empty file? I guess that is fine too? It would just be n/a tho.

sappelhoff commented 11 months ago

I see, is that an argument to not make electrodes.tsv required?

How do you respond to this comment by @dorahermes in https://github.com/bids-standard/bids-validator/issues/1771#issuecomment-1714213732?

iEEG data often do not make any sense if electrode positions are not shared (it is like sharing voxel timeseries from different brain regions that are randomly shuffled).

How are people using your data when coordinates are not shared @adam2392 ?

adam2392 commented 11 months ago

The general brain regions are known and stated in the metadata I think. The interpretation of the time series is more clinical such that certain electrodes correspond to an suspected epileptic region.

None of this required the exact positions so we never got them processed. However I did push for it :(.

The only con is that these data are not spatially comparable to other datasets and someone cannot validate that said electrode is in a brain position the metadata says it is in. However, beyond that I think it's still usable albeit not for brain plotting.

adam2392 commented 11 months ago

iEEG data often do not make any sense if electrode positions are not shared (it is like sharing voxel timeseries from different brain regions that are randomly shuffled).

I would disagree for our specific example. It is like sharing voxel time series where the brain lobe is known or some macro structure but you don't know the exact xyz coordinate (in our case). In general yes this is true unless further described in the metadata.

I am okay saying the file is required tbh. And then users can write n/a. The more REQUIRED the better.

dorahermes commented 11 months ago

iEEG data often do not make any sense if electrode positions are not shared (it is like sharing voxel timeseries from different brain regions that are randomly shuffled).

I would disagree for our specific example. It is like sharing voxel time series where the brain lobe is known or some macro structure but you don't know the exact xyz coordinate (in our case). In general yes this is true unless further described in the metadata. I am okay saying the file is required tbh. And then users can write n/a. The more REQUIRED the better.

+1 There are several iEEG cases that were discussed during the iEEG extension where it may be extremely challenging to obtain exact electrode positions, but data are still of value (such as intraoperative mapping or this large dataset by @adam2392 !)

The community feel for these cases, as also noted by @adam2392 , was that the point of BIDS is not to be restrictive, but to provide a guideline of where to put electrode contact information such that others can find it. If the validator generates an error or warning about a missing _electrodes.tsv file, this would help curators see where such information should be stored.

(With respect this dataset I could not find the electrode_layout that is referred to above, as the readme states that it was not allowed to share these sourcedata files. Within iEEG _electrodes.stv files, x,y,z positions are not just used for brain plotting, but for interpretation, linking to MRI data, atlases etc. If there is a need for a more general 'anatomical label' that should be discussed in a separate issue and probably related to the Atlas BEP)

adam2392 commented 11 months ago

To provide additional context.

Back then, it was more difficult to pipeline the electrode locations (at least for me). That and getting a hold of the MRI/CT data in usable format was a great pain that eventually was not worth it for our purposes. Then BIDs came on the scene and I was relatively new user there, but greatly vibed with the mission. Data sharing in general was just making its initial waves in neurology (it's still in its infancy rn), so I got a lot of pushback and friction to even share the data from collaborators that were not funded by the NIH/NSF. At that point, I did the "best I could" w/ what was allowed.

However, it is my impression now that there are more and more pushes to share data publicly when publishing. Moreover, the NIH is releasing a mandate(?) that requires this. Thus, I think making the electrodes.tsv REQUIRED forces the user to think about how to get this data. Moreover, having the necessary tools now to more easily get the electrodes.tsv locations (e.g. MNE-Python now supports this for ieeg) makes it so there is less of an excuse for not getting the coordinates.

tldr: it is unfortunate that there are data w/o the electrodes.tsv, but it should be REQUIRED as discussed above imo. A backwards-compatible fix is to allow the user to upload a file w/ all coordinates to n/a.

sappelhoff commented 11 months ago

IMHO the examples described above rather call for a RECOMMENDED status, where the validator would send a warning if no electrodes.tsv file is found. The warning may serve the purpose of making everyone think deeply about whether adding electrode locations may be possible.

I think adding an electrodes.tsv with NaN values would be a bit cluttery.

What's your opinion @effigies?

adam2392 commented 11 months ago

IMHO the examples described above rather call for a RECOMMENDED status, where the validator would send a warning if no electrodes.tsv file is found. The warning may serve the purpose of making everyone think deeply about whether adding electrode locations may be possible.

I will defer to everyone else, since I don't work on this as often anymore.

Thoughts on the warning: Perhaps the validator could state in the warning something about "MRI scan of the subject brain and CT scans with the electrodes implanted data" are basically what's needed with some "reasonable resolution" (e.g. fast scans of <30 slices are useless, which we got a lot of).

effigies commented 11 months ago

I think making the file required is fine. The values in tabular files have always been allowed to be set to n/a to indicate missing data. We do not need to highlight this as a particular case where you might want to.

effigies commented 11 months ago

I tested, by the way, and there is no rule in the current validator that prevents x, y and z from being n/a, even though only z is explicitly listed as being allowed to be n/a. So the proposal here would not require any additional code to accommodate the edge case of not having coordinates.

As a side thought, it probably would be useful in a case like this to put structure names as a column in electrodes.tsv, so you still get some context without referring back to the README.

sappelhoff commented 11 months ago

okay, fine by me then 👍

sappelhoff commented 2 months ago

@effigies @rwblair should we close this issue as it's targeted to the legacy-validator, and just keep the one I just opened on the bids-specification repo as a reference/todo?

effigies commented 2 months ago

My current approach is to tag issues with legacy or schema, and once the legacy validator is killed off for good, all legacy issues can be closed.

I don't want to stop anybody motivated from fixing a bug in the legacy validator, though, so I haven't been closing them.

But I'm okay either way. Feel free to close or leave open.

sappelhoff commented 2 months ago

Thanks, that's a good approach.

effigies commented 2 weeks ago

I implemented this in bids-standard/bids-specification#1896, but this would invalidate all of the ieeg examples with *_space-<label>_electrodes.tsv.

The inheritance principle does not associate, e.g., from ieeg_epilepsy:

sub-01_ses-postimp_space-IXI549Space_coordsystem.json
sub-01_ses-postimp_space-IXI549Space_electrodes.tsv
sub-01_ses-postimp_space-ScanRAS_coordsystem.json
sub-01_ses-postimp_space-ScanRAS_electrodes.tsv
sub-01_ses-postimp_task-seizure_run-01_ieeg.eeg

There are two problems with this dataset:

1) The inheritance principle associates metadata files with data files when the metadata file contains a subset of entities of the data file. Here, neither file has a subset of the other's entities. 2) The inheritance principle forbids multiple metadata files at a single level from associating to a data file. Even masking space for the sake of hacking it would lead to an ambiguity.

To resolve this, the two most straightforward options are:

1) Merge bids-standard/bids-specification#1896 as-is. Require a bare (no space-<label>) electrodes.tsv that can be found from the inheritance principle. Clean up the examples, and let dataset curators discover that their datasets have been invalid. This may involve renaming one existing space-<label> file or adding a new one. In this latter case, n/as can be used heavily to redact information that is either unavailable or too private. 2) Close bids-standard/bids-specification#1896. Give up on requiring an association and continue just to validate the contents of electrodes.tsv files, when found.

Less straightforward options:

3) Define and implement a multi-inheritance principle specifically for iEEG electrodes. This would require some discussion about how this plays out:

    space-X_electrodes.tsv
    sub-01/
      sub-01_space-X_electrodes.tsv
      ses-01/
        sub-01_ses-01_space-X_electrodes.tsv
        ieeg/
          sub-01_ses-01_space-X_electrodes.tsv
Which of these files are valid? Must `electrodes.tsv` be found in the same directory as the iEEG data files, or can they be found at a higher level? If there are multiple with the same `space` label, do you take the one closest to the data file (as with most of the inheritance principle), or should `space`s be unique, and this should be an error?

4) Define some way to indicate files that must exist within particular datatype directories, e.g., in every ieeg/ directory, there must be at least one *_electrodes.tsv file.

Just as a quick survey, looking at the examples and at OpenNeuro, all current electrodes.tsv files do sit inside the ieeg/ directory, so inheritance may not be needed.

In terms of pain to users, I would probably rank these 3 (least), 4, 2, 1 (most). In terms of pain to a developer, I would rank it 2 (least), 1, 3, 4 (most).

(I'm ranking 3 less painful than 4 for users because it should be more efficient to do an adapted inheritance-principle-style search than a repeated check for non-existent files. For development, I think (4) would be deceptively nasty or we'd end up just hacking it.)

dorahermes commented 2 weeks ago

I would like to hear which option @ftadel prefers as I am not sure what the consequences of 1. would be. I think the ieeg_epilepsy example is actually a really nicely curated one that applies to many sites.

sappelhoff commented 1 week ago

Thanks for the summary, Chris.

Do I understand 1. correctly, that in order to validate that an iEEG data file has a corresponding electrodes file , we need to apply fixes in the form of ...

... for this example 👇

sub-01_ses-postimp_space-IXI549Space_coordsystem.json
sub-01_ses-postimp_space-IXI549Space_electrodes.tsv
sub-01_ses-postimp_space-ScanRAS_coordsystem.json
sub-01_ses-postimp_space-ScanRAS_electrodes.tsv
sub-01_ses-postimp_task-seizure_run-01_ieeg.eeg

turn into "valid" through these changes 👇

sub-01_ses-postimp_coordsystem.json
sub-01_ses-postimp_electrodes.tsv
sub-01_ses-postimp_space-ScanRAS_coordsystem.json
sub-01_ses-postimp_space-ScanRAS_electrodes.tsv
sub-01_ses-postimp_task-seizure_run-01_ieeg.eeg

OR through these changes 👇

sub-01_ses-postimp_space-IXI549Space_coordsystem.json
sub-01_ses-postimp_space-IXI549Space_electrodes.tsv
sub-01_ses-postimp_coordsystem.json
sub-01_ses-postimp_electrodes.tsv
sub-01_ses-postimp_task-seizure_run-01_ieeg.eeg

?

That is: Require at least one electrodes.tsv file in "some space", where that space entity is NOT defined in the filename. And if more spaces are to be supplied, they may (must, actually) supply the space entity?

At least that is what I understand from your:

Require a bare (no space-


The inheritance principle forbids multiple metadata files at a single level from associating to a data file. Even masking space for the sake of hacking it would lead to an ambiguity.

So for our present case, the BIDS inheritance principle is ill defined, because, as Dora points out, it is a normal (and easily understandable) situation to have an iEEG data file, and supplied electrodes information in 2 (or n) different spaces. These spaces can then be useful for different analyses or users ... but in essence, it is an association of iEEG data with electrode location data (and different "views" on the "electrode locations" are what is supplied)

The inheritance principle associates metadata files with data files when the metadata file contains a subset of entities of the data file. Here, neither file has a subset of the other's entities.

same as above: in the present (IMHO reasonable) example, the inheritance principle doesn't fit what we need ... so I wonder whether it could be adapted.


Overall, I appreciate that this situation is difficult, and I am tending towards option 2., with the exceptions that I would not stop requiring it, and instead rather relax our credo of "when we require something we must be able to validate it" ... a few exceptions may be ok.


@dorahermes I think @ftadel has mentioned before that he is no longer working in the field ... but I fail to find his corresponding comment on GH.

dorahermes commented 1 week ago

Thanks @sappelhoff! I think this situation is difficult, because the _electrodes.tsv file is by definition session specific rather than run specific.

ftadel commented 1 week ago

@dorahermes I think @ftadel has mentioned before that he is no longer working in the field ... but I fail to find his corresponding comment on GH.

Indeed, I left the neuroimaging field last year. @rcassani @Moo-Marc Would you be interested in following up on this topic? You'd need to report the changes made here (bids-examples) in the datasets available for download on the Brainstorm website.