NDCLab / pepper-pipeline

tool | Python Easy Pre-Processing EEG Reproducible Pipeline
GNU Affero General Public License v3.0
3 stars 3 forks source link

write_raw_bids throwing issue regarding signal1.bin. #7

Closed Jonhas closed 3 years ago

Jonhas commented 3 years ago

Probably an issue regarding reading the raw file, would like how to approach this issue within feature/io branch

DMRoberts commented 3 years ago

I think this issue isn't related to signal1.bin and the BIDS format per se, but is due to some acquisition breaks within the data file. For example, instead of writing to BIDS, if we try to write to save from MNE as a .fif file, which doesn't involve BIDS:

import mne.io
import mne_bids

mff_raw = mne.io.read_raw_egi('NDARAB793GL3.mff', preload=False, verbose=True)
mff_raw.save('test_one_eeg_raw.fif')

Then we also receive errors, "RuntimeWarning: Acquisition skips detected but did not fit evenly into output buffer_size, will be written as zeroes." and "ValueError: shape mismatch: value array of shape (129,500) could not be broadcast to indexing result of shape (129,0)"

Looking at the 'annotations' attribute of the raw object, the first segment break occurs at around 408 seconds into the recording:

mmf_raw.annotations[0]

But, if you then crop the data to only include time prior to the first segment break, say the first 400 seconds, then exporting to .fif works:

mff_raw.crop(tmax=400)
mff_raw.save('export_test_eeg_raw.fif')

That .fif file can then be loaded and exported to BIDS, though with a warning that the data units (microvolts) are missing:

fif_raw = mne.io.read_raw('export_test_eeg_raw.fif')
bids_path = mne_bids.BIDSPath(subject='test', session='test', task='test', root='~/Downloads')
mne_bids.write_raw_bids(fif_raw, bids_path, format='auto')

But I think there is probably:

  1. A way to deal with the annotations / breaks in the data, to export the whole recording without having to crop / subset the recording.
  2. A way to move from .mff to BIDS without having to create an intermediary .fif file.
georgebuzzell commented 3 years ago

(reposting from slack):

All, I agree with @DMRoberts here, that it appears that there is actually no issue with the reading/writing functions. Instead, the errors appear because the data has periods of "no data" which mne does not expect and so it throws errors when trying to write. Dan found that if he selects just a part of the data and removes everything else, he can write to bids with no issue. The no-data periods are periods in-between individual tasks of data collection, so they are not needed. I think the solution is to: 1) load data using the read_raw_bids function, then 2) identify and remove each segment that has missing data, then 3) write the file as bids with all missing data removed. Then, when we load that file back in, it will not have any of the periods of missing data. One thing we need to be careful about, though, is that we still need a way to annotate the data to delineate periods where data was cut out if that makes sense. that might already be present in the data, or we might need to insert markers.

georgebuzzell commented 3 years ago

(reposting from slack)

@DMRoberts did you happen to notice if there were markers delineating the start/stop of tasks? Separately, are there any markers that appear right before/after periods of blank data? lastly? If you remove those periods, mne does not automatically insert a marker (similar to break-cnt in eeglab/matlab), correct? I.e. we need to manually insert a marker at breakpoints after removing the missing data, right?

Jonhas commented 3 years ago

@DMRoberts Thank you for your help! Regarding your message on converting the data to a .FIF file, this was something @DavyNeat and I discussed about previously, but were unsure on how to approach the issue. Would it be possible to convert the data prior to using it with the script? The current .MFF data is approximately 3GB, and saving it to RAM while reading can become a bottleneck within our script. I'm not too sure on how EEG data is recorded so I do not know if this is possible. Or is this simply a quick fix for a not so clean data set?

Jonhas commented 3 years ago

@georgebuzzell @DMRoberts The conversion works as it should, and a BIDSPath does get made, however, an exception is thrown. Before I made a pull request, I wanted to bring this issue since the lab uses BrainVision. After writing the BIDSPath, we get the following error: ImportError: pybv >=0.4 is required for converting file to BrainVision format This issue can easily be fixed by taking out a terminal and typing pip3 install pybv which will install pybv 0.5.0. However, this does not resolve the issue. For MNE will then throw the following error:. Conversion to BrainVision format needed to be stopped, because your raw data contains channel types that are not represented in Volts: "{'101 ', '17 ', '33 ', '9 ', '4 ', '81 ', '16 ', '83 ', '25 ', '31 ', '5 ', '84 ', '8 ', '18 ', '97 ', '26 ', '90 ', '82 ', 'STI 014', '9999', '20 ', '93 ', '91 ', '34 ', '50 ', '11 ', '30 ', '102 ', '35 ', '15 ', '23 ', '22 ', '27 ', '32 ', '104 ', '96 ', '13 ', '95 ', '92 ', '24 ', '94 ', '103 ', '14 ', '21 ', '28 ', '12 '}" Do we simply append units or do we drop these channels? Withiin the write_raw_bids procedure, MNE iterates through these channels and checks if the values are in micro-volts, and leave the other values as is.

DMRoberts commented 3 years ago

@Jonhas On your first question, I don't think we could convert the MFF to FIF or BIDS before using it in the script. But I think we will just be doing this conversion once for each file, and then be using the BIDS formatted data going forward. I think MNE should ideally be able to convert a MFF to BIDS without going through the FIF format as an intermediary, but this may be either a bug or an un-implemented feature in MNE at the moment.

On your second question, I also had to install pybv (v 0.5.0), which I had forgotten to mention.

When I run:

import mne.io
import mne_bids

mff_raw = mne.io.read_raw_egi('NDARAB793GL3.mff', preload=True, verbose=True)
mff_raw.save('test_preload_eeg_raw.fif', overwrite=True)
fif_data = mne.io.read_raw('test_preload_eeg_raw.fif')
bids_path = mne_bids.BIDSPath(subject='test', session='two', task='mytask', root='~/')
mne_bids.write_raw_bids(fif_data, bids_path, format='auto', overwrite=True)

I do get a warning that:

 UserWarning: Encountered unsupported non-voltage units: n/a
Note that the BrainVision format specification supports only µV.

However the BIDS file does completely export. One thing I just realized is that I'm not using the Docker container as I should, so maybe there is some package difference between us that has escalated that from a warning to an error. I can try the same later tonight using the Docker container.

I believe those channels you listed that don't have units of volts are stimulus / trigger channels, though I think MNE is also aware they are trigger channels, so I wouldn't think it would generate an error on conversion. We may have to do a little research into MNE to see the best way to handle this.

georgebuzzell commented 3 years ago

@Jonhas Thank you for testing this using the docker image and providing a clear description of what did/did not work.

@DMRoberts thank you for noting that you were not using the docker file. It would be super helpful moving forward to always use the shared docker file so that we are all "playing in the same sandbox". Sounds like not using the docker in this case fortuitously helped to identify the possible source of the issue here, though. It would be really great if when you re-run using the docker, you can confirm that you get the error, and if you can provide any insight into the differences that might be driving the error vs warning difference. Thanks for you help!

SDOsmany commented 3 years ago

After running

fname = "NDA.mff"
raw = preprocessing.read_raw(fname)
raw.crop(tmax=400)
raw.save('test_one_eeg_raw.fif',overwrite=True)
fif_raw = mne.io.read_raw('test_one_eeg_raw.fif')
bids_path = BIDSPath(subject='test', session='test', task='test', root='~/')
write_raw_bids(fif_raw, bids_path,overwrite=True, format='auto')

i get the same warnings as Dan

georgebuzzell commented 3 years ago

@SDOsmany you in the docker?

SDOsmany commented 3 years ago

yeah im using the docker file

georgebuzzell commented 3 years ago

@Jonhas @SDOsmany @DMRoberts those channels that throw the error/warning are not actual eeg channels, i think , I believe those are channels that indicate when particular events occur

georgebuzzell commented 3 years ago

@Jonhas we don't want to drop those files. I think we could just append at uV (micro-volts). Will need to check if that works for subsequent processing, but I would do that, at least for now.

Unless @DMRoberts sees an issue with that?

Basically, my understanding is that those channels are stim channels that code the presence of a stimulus (or other event) of interest as an impulse response. I am not certain what units they are in to begin with, though I could look into the documentation of the egi/mff file format to get some insight. Ultimately, though, I don't think the units will really for us per se, as it is just coding presence/absence of an event. So, I might suggest just appending units at uV for now and then we readdress as we hit the next step in processing.

georgebuzzell commented 3 years ago

@Jonhas Here is the doc on the egi binary (mff/egi) file: https://sccn.ucsd.edu/eeglab/testfiles/EGI/NEWTESTING/rawformat.pdf

Sounds like chans can either be in uV or A/D:

"As explained earlier (see page B-5), users have the option of saving their data to simple binary format as either A/D units or microvolts, assuming that the source file data are not already in the form of microvolts. A/D units are raw amplifier values which can be internally converted to microvolt values by Net Station. Users can perform a similar conversion by calculating a conversion factor using values that will be found in the header of the file.The header of a simple binary file has two fields that need to be consulted to determine whether the EEG data stored in the file are in microvolts or A/D units. These fields are the ‘bits’ field at offset 26 and the ‘range’ field at offset 28. If the values in the ‘bits’ and ‘range’ fields are both 0, then the file’s data is already in the form of microvolts. Users who save to simple binary format should note that this is the only method to determine if a file’s data are in A/D units or microvolts.Converting A/D Units to MicrovoltsIf the values read from the ‘bits’ and ‘range’ fields of the header are not equal to zero, this sig-nifies that the EEG data stored in the file have not been converted to microvolts, but are stored as A/D units.Converting the A/D units to microvolts can be done programmatically by applying the fol-lowing formula: microvolt value = (range / 2bits) x A/D value, where ‘bits’ and ‘range’ are read from the header of the file. "

georgebuzzell commented 3 years ago

@Jonhas I realize that my last comment might lead down a rabbit hole... I don't think we need to much worry about the units for now. I think just assigning anything for now is fine. The test will be when we try to read/interpret those stim channel files to organize the data into events of interest.

If you are able to go ahead and just do that for now, then do a pull request, I think that puts us at a good spot for a meeting to discuss next steps. @DMRoberts and I were talking, and it is probably crucial at this point to have a meeting where I discuss with you and the others (@DavyNeat @SDOsmany @yanbin-niu @Pranjali051 @stevenwtolbert @CamachoBry) more about what exactly stim channels are, what we do with those data, etc. Also, we need to discuss some of the metadata fields that should be specified when writing to BIDS. All of this relies on EEG content knowledge, so, assuming you all are interested in learning about those things, I think it would be good to discuss all of this on a call. We can record the call, and perhaps I can also create some simple documentation as a reference as well for moving forward.

DMRoberts commented 3 years ago

A couple updates:

1) I tried running the MFF to FIF to BIDS export using the Docker container. For the file truncated to 400 seconds in length, I get the same warnings about units on export as before, and the export is successful. I'm not sure I can export the whole file from within the Docker on my local computer, because when running within Docker I seem to run out of RAM before the export completes. This is after allocating the Docker container to 14 of the 16 GB of RAM on my computer.

2) I may have found a solution for the units warning / error. This page in the MNE documentation https://mne.tools/stable/auto_tutorials/intro/plot_20_events_from_raw.html describes extracting events from 'stim' channels, which the EGI / MFF format seems to use. Basically each potential numeric event code is it's own channel, which is pulsed high when active. The channel called 'STI 014' then contains the aggregate of all these individual stim channels - the name apparently has some meaning within the EGI nomenclature. According to that MNE doc, there are functions that can be used to extract an events array from the aggregate stimulation channel, and then apply those events are annotations to the dataset. After doing that I then removed the stimulation channels, assuming that they aren't needed anymore. At that point I can export to BIDS without the units warning, because only the EEG channels remain within the channels array. I'm assuming that all the event / stimulation info we need is now within the annotations, though I'm not 100% if that is accurate. I think we may need to do this though, since I don't believe the BrainVision format has a concept of stimulation events as channels.

In this example I skip the loading from MFF and exporting to FIF, assuming it was already done as in a previous example snippet.

import mne.io
import mne_bids

# load the data previously converted from MFF to FIF
data = mne.io.read_raw('test_preload_eeg_raw.fif')
# use the aggregate stimualtion channel 'STI 014' to create an events structure
events = mne.find_events(data, stim_channel='STI 014', shortest_event=1)
# convert events to annotations
event_annotations = mne.annotations_from_events(events=events, sfreq=data.info['sfreq'], orig_time=data.info['meas_date'])
# add annotations to existing annotations (the bad acquisition skips) and set
data.set_annotations(data.annotations + event_annotations)
# keep only the EEG channels, removing the stimulation channels
data.pick_types(eeg=True)

bids_path = mne_bids.BIDSPath(subject='test', session='two', task='mytask', root='~/')
mne_bids.write_raw_bids(data, bids_path, format='auto', overwrite=True)
georgebuzzell commented 3 years ago

@DMRoberts this is fantastic! Thank you for looking into how to convert the event information from the imported mff file into the standard format expected by mne/bv!!

I agree that the edits you made look like the correct way forward for dealing with the multiple stim channels. This may also reduce the memory requirements somewhat. I am curious if you tried exporting the full file after extracting the event info and deleting the extra stim channels? If so, what were the memory requirements for this file (full length, minus the extra stim channels)?

yanbin-niu commented 3 years ago

I tried today in my local environment (rather than container, since I feel it is missing many packages).

  1. I did not get any issue while saving .fif file and export into BIDS (using fif as an intermediary), though I also got same unit warning - "UserWarning: Encountered unsupported non-voltage units: n/a. Note that the BrainVision format specification supports only µV."

However, I do not think it should an issue, because: First, when I check the info['chs'] - all the EEG channels have no problem. it's just that STIM channels' units are FIFF_UNIT_NONE.

image image

Second, go to pybv packages -> io. py which actually reports the warning. we can see the warning is just caused by that STIM channels' units are NONE (the second if), which makes sense. We did not see any warning from the first if.

image

which means our eeg channels are good. I added some log into io.py and it verified the point. So, I think this warning may not be a concern. But, I could be totally wrong here.

  1. I do have problem with writing BIDS directly from .mff file, which is the same problem Jonhas has encountered early on. It seems like the problem with the extension after reading the file. I do not know why the filename will add signal1.bin after NDARAB793GL3.mff. I tried to rewrite the raw.filenames[0], but it just can not be changed, see below.
image
yanbin-niu commented 3 years ago

Regarding my second point - throw "Unrecognized file format" error:

I checked mne_bids/write.py:

image

Then I found the definition for ALLOWED_INPUT_EXTENSIONS:

image

in which

image

I did not see .bin or even .mff was listed there. So, I doubt .mff can be directly exported into BIDS.

Jonhas commented 3 years ago

I agree completely. MNE throws the exception when reading the MFF file once you try to write the Bids Path. Manipulating the extension of signal1 wouldn't work either, and since filenames[0] is readonly, you cannot manipulate the extensions to an appropriate input extension at runtime. So far the only solution has been converting it to an FIF file and writing the BIDSPath from the converted file.

yanbin-niu commented 3 years ago

@Jonhas I agree, especially even somehow we find a way to change the filenames[0] extension into .mff, it's still not gonna work, since .mff is not included in the ALLOWED_INPUT_EXTENSIONS. Also, the io.raw.save() function can only save as a .fif file (though I have no idea about the differences between .fif, .fif.gz, .sss.fif etc., may need some background knowledge here).

image

It seems .fif is the way to go. But, if we want to double check, we can submit an issue in the MNE-python repo....(I tried to find past issues, but did not find anything related to our question)

georgebuzzell commented 3 years ago

@yanbin-niu and @Jonhas I think that, at least for now, the workaround of importing the mff, writing to fif, reloading the fif, and then converting to bids (using .vhdr as the eeg file format) is a workable solution. I think it is fine to this workaround, at least for now, as our method of dealing with egi (mff) files. However, if either of you, or anyone else sees a critical issue with this workaround please do speak up. Otherwise, I would say we move on using this workaround and we can always come back and create a more straightforward/elegant solution in the future if need be. Thoughts?

Jonhas commented 3 years ago

@georgebuzzell I agree. Currently, this is the best solution to the problem and we can always come back if a more elegant solution is necessary.