Create a bidscrambler for MEG data in the fif format

marcelzwiers commented 6 months ago

I have already created bidscramblers for tsv, json and nifti files, but I feel that I don't have the expertise to write a scrambler for eeg and meeg data.

robertoostenveld commented 5 months ago

Can you search for Open Source (and GPL compatible?) Python code to read and write the various file formats?

marcelzwiers commented 5 months ago

MNE-python?

robertoostenveld commented 5 months ago

Should we have bidscrambler_xxx where xxx is the file format? The reason for asking is that there is not a single file format (as with nifti) in use with EEG and MEG, but a few each.

marcelzwiers commented 5 months ago

I think having one scrambler_eeg/meg.py function is fine. The different file formats can then be handled within that function. But if it becomes big, we can always split things. The user won't know anyhow, because the CLI is all handled by scrambler.py

robertoostenveld commented 3 months ago

I think we should not approach it from the general EEG/MEG perspective, but rather from the file format perspective. A BIDS dataset is a structured list of files with well-defined file names; when scrambling them you need to know which files to read/modify/write. Splitting the logic of determining which files to read/write from the content-wise scrambling provides a strategy to also implement this for files that are used by the other BIDS modalities (PET, NIRS, iEEG, microscopy, motion), even though we are not neccessarily domain experts on all of these ourselves.

Could you start by extending the scrambler such that it reads and writes (i.e., copies) a .fif file? The fif naming scheme is well documented in the MEG specification. That would allow testing on use case 2.3. For now you don't have to worry what is contained in the fif file, but the Python code that you write should have a placeholder for reading the (binary) content and for writing the content. In between the reading and writing, the content would be modified; that is something @schoffelen or I can implement.

Once we have it for .fif files (one of the MEG formats), we can take the next step and implement it for one of the EEG formats so that we can also continue with use case 2.4.

schoffelen commented 3 months ago

Earlier, I have already pushed some placeholder code that for now should be capable of dealing with fif-files of the 'raw' and 'average' type (for the average: provided it contains just a single condition average, still need to figure out how to generically detect and deal with multiple averages). The scrambling performed for now (I think) is a scrambling across channels (not really meaningful but good enough for now). I haven't tested this yet on a full bids directory, but bids-ified a single test dataset I had lying around locally, and this seemed to work. In order to efficiently contribute to this I'd first need to familiarize me a bit more with how to efficiently develop code and test interactively using pycharm, and to get used to the state-of-the art testing framework implemented by @marcelzwiers

Note:

the raw fif-file format is - obviously - for raw data, one could wonder about the necessity to scramble this for privacy purposes, but this is the format that is most easy to deal with.
the 'average' fif-file format, i.e. fif-files that traditionally contain event-related averages are a good target for scrambling, because these may be used downstream to contain the first-level derivative results, most typically as ERFs. If we don't require the files to be readable/interpretable without any glitches outside our applications, we could also consider filling the numeric data with other stuff (e.g. frequency domain data)

robertoostenveld commented 3 months ago

@schoffelen could you add a test_scramble_fif to the https://github.com/SIESTA-eu/wp15/tree/main/BIDScramble/tests and document its use in https://github.com/SIESTA-eu/wp15/blob/main/usecase-2.3/README.md#scrambled-data ?

robertoostenveld commented 3 months ago

I tested it for use case 2.3 and it seems to work.

I don't know the valid options for scrambling, so that still needs some attention. @marcelzwiers presenting the BIDScramble in general would be useful for that.

Also test_scramble_fif should be added to the tests.

robertoostenveld commented 3 months ago

A "null" and "permute" scrambler are now both in place, and the fif scrambling is part of the test suite.

This should be good enough for now, time to move on with the next scrambler for EEG data, see #28

schoffelen commented 3 months ago

too bad, rechts ingehaald. I will flush my attempts then.

robertoostenveld commented 3 months ago

oh, ik dacht dat jij vanmiddag andere zaken te doen had ;-)

schoffelen commented 3 months ago

ja, klopt, maar na 5 uur niet meer, en ik was al een beetje begonnen... and with my limited python skills I am moving a bit more slowly than the rest

marcelzwiers commented 3 months ago

The pytest data is rather big (900MB or so), it is desirable if there was a smaller file to work with. Also the isevoked codepath is not tested

SIESTA-eu / wp15

Create a bidscrambler for MEG data in the fif format #13