Open guiomar opened 3 years ago
See also discussion here: https://github.com/bids-standard/bids-specification/issues/197
summary of #197
1) HDF5 not recommended by most 2) in general if data can be in the same format as raw then stay in the same format (segment and average don't need change) 3) change of file format only if the format of raw cannot support it ; if so nifti and cifti which are already BIDS supported can probably cover most cases up to 8D - with the advantages that the first 4D are fixed x,y,z,t and other dim to specify 4) if nifti and cifti don't work, then so far 'we' seem resolved to use .mat and .npy (but then I suggest we also support R if we go down the road of supporting computing platform format)
I think the ephys we should discuss point 4 only -- and see if we can agree for all derivatives on points 1/2/3/ within issue 197 @robertoostenveld @arnodelorme @sappelhoff
Thanks a lot @CPernet for the nice summary!!
To chip in on this, what about the BrainVision data format ? OK it comes from a company but AKAIK it's open, simple and sufficiently flexible. Two text files (.vhdr and .vmrk) with the header (i.e. data description) and markers (i.e. any "event") informations, plus a simple binary file (multiplexed) with the signals. Easy to read, easy to write.
And it's already accepted in BIDS-EEG.
see above: '2 in general if data can be in the same format as raw then stay in the same format'
--> so if you used .vhdr and .vmrk keep doing so - I don't see where is the question @ChristophePhillips
oh oh HDF5 might still be on the table Teeters, J., Benda, J., Davison, A., Eglen, S., Gerkin, R. C., Grethe, J., … Wark, B. (2016). Requirements for storing electrophysiology data. Retrieved from http://arxiv.org/abs/1605.07673 ... INCF stuff
It seems like just stating 'HDF5' is underspecified. Moreover, NWB is already accepted (but not supported) in BIDS-iEEG and based on HDF5.
For parallel compute (and clinical use cases) MEF3 is also accepted in BIDS-iEEG (and I am currently very happy working with MEF3, since it allows me to easily and efficiently work with large iEEG datasets in nice small chunks).
HDF5 is also used by MATLAB, and hence implicitly supported in BIDS-EEG, as that allows for EEGLAB .set
datasets (which are .mat
files in disguise, and hence HDF5). HDF5 is also used in SNIRF, which is the format considered for BIDS-NIRS https://bids.neuroimaging.io/bep030.
In all cases (nwb, eeglab, snirf) there is a clear specification on top of HDF5 that is defined and maintained outside of the BIDS ecosystem.
FWIW, HDF5 I think needs more of pros listed in its item in the OD. Some "pros":
now formalized into the derivatives guidelines -- link to follow HDF5 and zarr supported when same format or tsv not possible
Hi!
I'll share here some of the discussions we have on the google doc related to BEP021, so we can keep track and they don't get forgotten once we resolve the comments.
One of the first things we need to agree is what data formats should we use to store the resulting matrices of data preprocessing.
This is what we currently have:
.mat PROS: -Open specification -Well supported I/O in both Matlab and Python CONS: -Proprietary format -Allows for highly complex data structures that might need further documentation v7.3 is which is based on HDF5 format (not proprietary) is not supported in Python
.npy PROS: -Open specification -Well supported I/O in Python and C++ -Allows only n-dimensional arrays, limited complexity and thus not easily abused CONS: -Experimental support for Matlab
.txt PROS: -Simple and easy I/O CONS: -Large memory footprint, inaccurate numeric representation
.h5 -See blog post for detailed discussion
@ChristophePhillips commented: Any chance of using the NIfTI format? It was devised for images but can easily any type of 2D/3D/4D signals... and it's typically well interfaced.
@arnodelorme commented: Consider adding .set EEGLAB format and .vhdr Brain Vision Exchange Format which both support data epochs definition and are also both included in BIDS raw EEG data definition.