bids-standard / bep021

Organizing and coordinating the BIDS Extension Proposal 21: Common Electrophysiology Derivatives
https://bids.neuroimaging.io/bep021
5 stars 1 forks source link

Data format(s) #1

Open guiomar opened 3 years ago

guiomar commented 3 years ago

Hi!

I'll share here some of the discussions we have on the google doc related to BEP021, so we can keep track and they don't get forgotten once we resolve the comments.

One of the first things we need to agree is what data formats should we use to store the resulting matrices of data preprocessing.

This is what we currently have:

.mat PROS: -Open specification -Well supported I/O in both Matlab and Python CONS: -Proprietary format -Allows for highly complex data structures that might need further documentation v7.3 is which is based on HDF5 format (not proprietary) is not supported in Python

.npy PROS: -Open specification -Well supported I/O in Python and C++ -Allows only n-dimensional arrays, limited complexity and thus not easily abused CONS: -Experimental support for Matlab

.txt PROS: -Simple and easy I/O CONS: -Large memory footprint, inaccurate numeric representation

.h5 -See blog post for detailed discussion

@ChristophePhillips commented: Any chance of using the NIfTI format? It was devised for images but can easily any type of 2D/3D/4D signals... and it's typically well interfaced.

@arnodelorme commented: Consider adding .set EEGLAB format and .vhdr Brain Vision Exchange Format which both support data epochs definition and are also both included in BIDS raw EEG data definition.

guiomar commented 3 years ago

See also discussion here: https://github.com/bids-standard/bids-specification/issues/197

CPernet commented 3 years ago

summary of #197

1) HDF5 not recommended by most 2) in general if data can be in the same format as raw then stay in the same format (segment and average don't need change) 3) change of file format only if the format of raw cannot support it ; if so nifti and cifti which are already BIDS supported can probably cover most cases up to 8D - with the advantages that the first 4D are fixed x,y,z,t and other dim to specify 4) if nifti and cifti don't work, then so far 'we' seem resolved to use .mat and .npy (but then I suggest we also support R if we go down the road of supporting computing platform format)

I think the ephys we should discuss point 4 only -- and see if we can agree for all derivatives on points 1/2/3/ within issue 197 @robertoostenveld @arnodelorme @sappelhoff

guiomar commented 3 years ago

Thanks a lot @CPernet for the nice summary!!

ChristophePhillips commented 3 years ago

To chip in on this, what about the BrainVision data format ? OK it comes from a company but AKAIK it's open, simple and sufficiently flexible. Two text files (.vhdr and .vmrk) with the header (i.e. data description) and markers (i.e. any "event") informations, plus a simple binary file (multiplexed) with the signals. Easy to read, easy to write.

And it's already accepted in BIDS-EEG.

CPernet commented 3 years ago

see above: '2 in general if data can be in the same format as raw then stay in the same format'
--> so if you used .vhdr and .vmrk keep doing so - I don't see where is the question @ChristophePhillips

CPernet commented 3 years ago

oh oh HDF5 might still be on the table Teeters, J., Benda, J., Davison, A., Eglen, S., Gerkin, R. C., Grethe, J., … Wark, B. (2016). Requirements for storing electrophysiology data. Retrieved from http://arxiv.org/abs/1605.07673 ... INCF stuff

dorahermes commented 3 years ago

It seems like just stating 'HDF5' is underspecified. Moreover, NWB is already accepted (but not supported) in BIDS-iEEG and based on HDF5.

For parallel compute (and clinical use cases) MEF3 is also accepted in BIDS-iEEG (and I am currently very happy working with MEF3, since it allows me to easily and efficiently work with large iEEG datasets in nice small chunks).

robertoostenveld commented 3 years ago

HDF5 is also used by MATLAB, and hence implicitly supported in BIDS-EEG, as that allows for EEGLAB .set datasets (which are .mat files in disguise, and hence HDF5). HDF5 is also used in SNIRF, which is the format considered for BIDS-NIRS https://bids.neuroimaging.io/bep030.

In all cases (nwb, eeglab, snirf) there is a clear specification on top of HDF5 that is defined and maintained outside of the BIDS ecosystem.

yarikoptic commented 1 year ago

FWIW, HDF5 I think needs more of pros listed in its item in the OD. Some "pros":

CPernet commented 1 year ago

now formalized into the derivatives guidelines -- link to follow HDF5 and zarr supported when same format or tsv not possible