bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
273 stars 156 forks source link

Supporting .fif Format for EEG BIDS #276

Closed alexrockhill closed 5 years ago

alexrockhill commented 5 years ago

@jasmainak, @sappelhoff and I were discussing whether BIDS should support .fif format for EEG in the following issue: https://github.com/mne-tools/mne-bids/issues/229. Are there strong reasons against supporting .fif besides parsimony? The reasons to support .fif seem to me to be 1) a large and growing user base for MNE which uses the .fif format, 2) potential loss in conversion to BrainVision format and especially .edf, 3) a succinct and comprehensive file that is well-formatted in structure (as opposed to BrainVision's three separate files where you need a program even just to rename the file) and 4) good documentation of how to convert to many other file types using MNE which is free and open-source.

sappelhoff commented 5 years ago

Hi @alexrockhill - let me address your points below:


a large and growing user base for MNE which uses the .fif format

Although it is true that MNE has a big and growing user base, we need to take all users into account for data format decisions

potential loss in conversion to BrainVision format

--> In my opinion, lots of the "potential losses" are changed settings and/or derivatives of the raw data that are saved back to the .fif file in MNE-Python ... but these changed settings / derivatives can and should be documented and saved apart from the raw data, using the BIDS specification for derivative data

--> Note the discussion for derivative data is still ongoing, and we are actively discussing file formats. Perhaps you can better make your argument for .fif there.

a succinct and comprehensive file that is well-formatted in structure

is there an open documentation for the .fif format hosted somewhere? I know that I have spent a long time searching for one, and then found an outdated manual on some departmental website.

(as opposed to BrainVision's three separate files where you need a program even just to rename the file)

BrainVision being separated into three files is a curse and a blessing at the same time.

--> We have .eeg, .vmrk and .vhdr, of which you can use any text editor to simply open the marker (vmrk) and header (vhdr) file and learn a lot about the data. The eeg file contains just the channel wise timecourses and is thus also very simple to handle

The curse is the "baggage" (3 files) and the "complexity" of renaming, for which tools can be developed quite easily (e.g., MNE-BIDS command line utility cp)

good documentation of how to convert to many other file types using MNE which is free and open-source.

This is going back to your second point on how conversion can be lossy --> is there a reason why this argument should apply when going from fif to BV ... but not when going from BV to fif?

sappelhoff commented 5 years ago

Perhaps @robertoostenveld has some points to add or wants to refute some points I made?

alexrockhill commented 5 years ago

Sorry not to respond sooner I was running an experiment.

To respond to your points, MNE-python users are the only ones I know who use .fif but that does seem like a fair number of users not that I know the statistics. After all, .fif is popular enough to use for MEG.

To my knowledge, .mat files are multi-purpose Matlab files with no specified underlying structure whereas -raw.fif files have a consistent structure so I'm not sure that's an accurate comparison.

As for potential losses, I don't have experience with .edf but according to this paper, "BrainVision is supported by EEG-BIDS, and makes up for some technical shortcomings of EDF (such as being able to store data to a higher level of numerical precision of up to 32 bits)." https://www.nature.com/articles/s41597-019-0105-7.pdf. This suggests that EDF is less than 32 bit precision whereas .fif if 32-bit by default and specifiable to 64-bit: https://martinos.org/mne/dev/manual/io.html.

I definitely agree that .fif and BV are roughly equivalent, and BV satisfies the needs of the community without the need to add .fif. My only qualms are that 3 files are 2 too many or at least 1 too many and that it is a format from a company as opposed to open-source. Maybe others can comment on how the .fif are structured, I really don't know, but many of the other BIDS file formats supported for data, such as nii.gz, are compressed and not human readable, my impression was that the sidecars were to be human readable and that that was more important.

I think three supported data formats is not too many, especially if each can read in the others. From my perspective, some people I know are moving away from MATLAB to Python so, in my opinion, it would be good support the native MNE-Python format.

sappelhoff commented 5 years ago

I'll let other people respond, but I wanted to make some quick and small points:

  1. Isn't FIF also a data format by a company? Do we know anything about the licensing of the FIF data format?
    • just because it's connected: You can view the BrainVision specification on their website.
  2. BIDS EEG permits more than two data formats. EDF and BrainVision are recommended ... but the EEGLAB format (.set, .fdt), and the Biosemi format (.bdf) are also accepted
alexrockhill commented 5 years ago

You are right about FIF being a format by a company and not any better for that reason, sorry my mistake.

jasmainak commented 5 years ago

@alexrockhill did I understand correctly that you had your files in .fif format because that was the native data format and not because MNE supports it?

@sappelhoff I would ask differently -- why do you ask people to convert from manufacturer-specific formats? Do you have buy-in from the manufacturers? Have they committed to use the BIDS format in the future? I'm afraid it's harder to convince people to convert file formats than to convince them to store the metadata in an organized way ...

alexrockhill commented 5 years ago

@jasmainak Yes, some data is in .fif natively, from Elekta NeuroMag but just personally we also collect data in BV format.

sappelhoff commented 5 years ago

@jasmainak Yes, some data is in .fif natively, from Elekta NeuroMag but just personally we also collect data in BV format.

Do I understand it correctly, that EEG data can be collected independently of MEG data (i.e., only EEG data colection, no concurrent MEG) and saved in FIF? Which equipment permits this? Is that commonly done?

why do you ask people to convert from manufacturer-specific formats?

I see two main reasons not to support all data formats:

  1. the heterogeneity of data formats means that researchers will experience blocks when they switch from one format to the other ... e.g., when wanting to re-use data from a publicly shared dataset. Each data format has its own particularities (and peculiarities) .. and it always takes time to learn them ... and sometimes it's not even easy (which leads to 2.)

  2. some data formats do not have a stable, openly accessible, clear and obvious documentation --> which is a red flag in my opinion.

Do you have buy-in from the manufacturers? Have they committed to use the BIDS format in the future?

BrainProducts (company for BrainVision format) has expressed a strong interest in BIDS and they are actively developing converters and features for their users.

EDF format does not have a manufacturer, but I know that many manufacturers support EDF out of the box (e.g., Biosemi, Brain Products)

I'm afraid it's harder to convince people to convert file formats than to convince them to store the metadata in an organized way

That is True, but hopefully the conversion is a lossless as possible, and for the things that "get lost in translation", we have the /sourcedata directory in BIDS

regarding the convincing: Yes, it's tough, but I think it's worth it for the two points mentioned above

alexrockhill commented 5 years ago

@sappelhoff No, we don't collect EEG in FIF format without MEG, but just personally we have done projects that focus on EEG for comparability with other EEG studies and in that case it makes sense to publish EEG data originally collected in FIF alone for space reasons and not to publish simultaneous MEG before we have the chance to perform a first analysis.

I find that fairly convincing not to make EEG BIDS compatible with FIF as this is a small, uncommon case, and those are reasonable points.

jasmainak commented 5 years ago

Fair enough to close issue if there is no strong practical usecase.

robertoostenveld commented 5 years ago

I just returned from holiday, so let me just chime in. I agree with closing the issue. A lot of the points have been discussed elsewhere before. Let me just add some arguments that were not explicitly mentioned above. Note that I don't want to start the discussion again, just add some stuff for completeness and reference.

The *.fif format was selected for BIDS-MEG because it is the native format of Neuromag/Elekta/Megin and is manufacturer-defined and maintained. For BIDS-MEG the rationale was to keep all data in the hardware manufacturers' format (also CTF, 4D, etc). It also happens to be used by MNE Python, but that was not an argument. Note that BIDS formats don't have to be manufacturer-based; both EDF and NIFTI are examples of well-maintained community-based formats.

Extending the number of file formats supported by BIDS (e.g. also considering mnc and hdr/img for MRI) creates an exponential cost for supporting BIDS. All (future) software packages that are BIDS compliant have to support all formats.

The MATLAB format is HDF5 and a well-defined container format, just like fif (and for example avi). Although the content might be considered undefined (since flexible by design), the EEGLAB .set (which is a mat file, and allowed in BIDS-EEG) pins down the definition. Note that the EEGLAB .set is not a perfect definition; but the .fif format itself and especially the additional tags introduced by MNE-Python in the .fif format are neither (and cause headaches in other software that support *.fif, such as FieldTrip).

If there were more raw data formats supported, the cost of BIDS would be lower for data producers, but higher for data re-users. We have to consider the (expected) ratio of consumers/producers.

Data in the original file format (or rewritten in some file format that the data producer deems useful, such as fif for EEG) can be shared in sourcedata; the code directory can contain the conversion scripts to the BIDS format (e.g. BrainVision), for which conversion tools are readily available (e.g. from MNE-Python or FieldTrip).

A lot of the resistance that I see against the file formats in BIDS seems related to (quite understandable) unfamiliarity with the capabilities of the selected file formats. We should try to better document the capabilities of BIDS as it stands now, and disseminate that knowledge, e.g. on the bids-starter-kit.