NomisCiri / eeg_manypipes_arc

This project contains all code to reproduce the Analyses of the EEG manypipes project
MIT License
1 stars 1 forks source link

Preprocessing: #6

Closed NomisCiri closed 2 years ago

NomisCiri commented 2 years ago

Since we talked about it last time, I read the autoreject documentation again. maybe the best workflow is:

1) hp filter, 2) run autoreject (but not transform), 3) then run ICA on "good channels" only

ica = mne.preprocessing.ICA(random_state=99) ica.fit(epochs[~reject_log.bad_epochs])

4) then detect classic components, and apply ica 5) run another autoreject and transform this time

sounds reasonable?

sappelhoff commented 2 years ago

I would do:

  1. Run pyprep to find bad channels (basically the first part of the PREP --> https://www.frontiersin.org/articles/10.3389/fninf.2015.00016/full) ... the bad channel detection algorithms are very good in my experience, because 5 different "measures" are applied to find whether a channel is bad or not, including very basic algorithms (bad-by-flat) and advanced algorithms like "RANSAC"
  2. apply MNE auto methods to mark bad segments:
    1. annotate_amplitude ... for this we need to choose peak-to-peak thresholds when a channel is considered flat, or just ultra-noisy ... I would pick 2µV and not do this for the "ultra noisey" option (peak param)
    2. annotate_break
    3. annotate_muscle_zscore
    4. annotate_nan
  3. hp filter (1Hz)
  4. run ICA on clean EEG channels only (raw data!), and ignoring bad temporal segments
  5. correlate ICE comps with VEOG and HEOG to find bad components --> reject them
  6. apply cleaned ICA solution to the non-filtered raw data
  7. filter how we want (for ERP: 0.1Hz highpass, 40Hz lowpass, for time freq: i don't know) HP 0.1, LP: 100, Notch: 50, 100
  8. make epochs
  9. downsample to 256Hz? (it's now 512Hz) ... optional let's not downsample, if we lowpass at 100Hz, we should maximally try do downsample to 300Hz, and that's not a nice integer multiple of 512 ... it's all inconvenient, so let's leave it at 512
  10. apply autoreject (with transform)
  11. apply average reference to channels

... for the timefreq analyses we mustn't lowpass too much, but I don't really know what cutoff we should set:

for example, do we even want to look at gamma? ...

sappelhoff commented 2 years ago

we also have another issue: For the timefreq analyses we likely need larger windows than for the ERP analyses --> thus either we make two "kinds" of epochs (short ones, long ones) and apply autoreject to them respectively. But this will lead to different channel interpolations and epoch rejections respectively

Or we make one kind of epochs (long ones), which we "crop" for the ERP analyses ... but this might lead to slightly more interpolation and epoch rejection for the ERP analyses than actually needed (for example because we just need data from -0.2 to 0.8s ... but due to making "long" epochs we do them from -1 to 1s ... and at 0.9s there is an artifact, so that epoch gets rejected although with making "short" epochs it wouldn't have been)

Both methods have drawbacks, I favor method 2 currently (only making "long" epochs, and cropping them for ERP analyses)

WDYT?

NomisCiri commented 2 years ago

I would do:

  1. Run pyprep to find bad channels (basically the first part of the PREP --> https://www.frontiersin.org/articles/10.3389/fninf.2015.00016/full) ... the bad channel detection algorithms are very good in my experience, because 5 different "measures" are applied to find whether a channel is bad or not, including very basic algorithms (bad-by-flat) and advanced algorithms like "RANSAC"
  2. apply MNE auto methods to mark bad segments:

    1. annotate_amplitude ... for this we need to choose peak-to-peak thresholds when a channel is considered flat, or just ultra-noisy ... I would pick 2µV and not do this for the "ultra noisey" option (peak param)
    2. annotate_break
    3. annotate_muscle_zscore
    4. annotate_nan
  3. hp filter (1Hz)
  4. run ICA on clean EEG channels only (raw data!), and ignoring bad temporal segments
  5. correlate ICE comps with VEOG and HEOG to find bad components --> reject them
  6. apply cleaned ICA solution to the non-filtered raw data
  7. filter how we want (for ERP: 0.1Hz highpass, 40Hz lowpass, for time freq: i don't know)
  8. make epochs
  9. downsample to 256Hz? (it's now 512Hz) ... optional
  10. apply autoreject (with transform)
  11. apply average reference to channels

... for the timefreq analyses we mustn't lowpass too much, but I don't really know what cutoff we should set:

  • delta (1–3 Hz)
  • theta (4–7 Hz)
  • alpha (8–12 Hz)
  • beta (13–30 Hz)
  • gamma (30–100 Hz)

for example, do we even want to look at gamma? ...

Your approach still sounds great to me. I just wanted to re-iterate the autoreject pipeline here, because i think i did not recall it properly during our meeting. I was thinking maybe doing it with autoreject only saves us some time. But as you have your pipeline anyway we will just stick to that.

one thing that I picked up once is HP filter before ICA at 0.1 and afterwards filter to 1 hz.

i would actually have not included delta (is that not most relevant for sleep and stuff?) and left gamma in but for no real reason. Edit: But including some gamma range will maybe require a 100hz HP filter, which is quite a lot for ERP.... so maybe no gamma, lets see what casper said.

NomisCiri commented 2 years ago

we also have another issue: For the timefreq analyses we likely need larger windows than for the ERP analyses --> thus either we make two "kinds" of epochs (short ones, long ones) and apply autoreject to them respectively. But this will lead to different channel interpolations and epoch rejections respectively

Or we make one kind of epochs (long ones), which we "crop" for the ERP analyses ... but this might lead to slightly more interpolation and epoch rejection for the ERP analyses than actually needed (for example because we just need data from -0.2 to 0.8s ... but due to making "long" epochs we do them from -1 to 1s ... and at 0.9s there is an artifact, so that epoch gets rejected although with making "short" epochs it wouldn't have been)

Both methods have drawbacks, I favor method 2 currently (only making "long" epochs, and cropping them for ERP analyses)

WDYT?

i think cropping afterwards should be fine. I have no a prior expectation on whats gonna be lost by adding 100 ms to the timewindow, but i don't think it will be too much, especially since a fixation cross will be displayed there and nothing else that cause people to blink or twitch.

although i think -1 might be a bit too long if we epoch around feedback, we don't want to have movement artefacts in there and response times are often <1s i feel. Maybe lets do it at 0.25 so that at least one cycle of 4Hz is in there?

NomisCiri commented 2 years ago

Ah no just read in the hypothesis doc that "all timing-related questions are related to stimulus onset" and the stimulus was presented for 500ms then maybe rather -1 to 0.5?

just don't want too much motor stuff in the epochs. or if we only look at average, for our analyses subtract left button-press from right button-press responses otherwise we will for sure have something in the erps and in beta that's only related to pressing the button, and then take -1 - 1

sappelhoff commented 2 years ago

yes I think what you say is reasonable -- I just don't have a good idea how long epochs in general "have to be" to TFR analyses :thinking: we can double check with casper

sappelhoff commented 2 years ago

fyi: I started declaring channels IO1 and IO2 as "EOG" channels, because they are veeeery close to the eyes and get very high amplitudes by the smallest of eye movements. --> I suggest not including these two channels in any EEG analyses (much like VEOG, and HEOG are not included) ... WDYT?

NomisCiri commented 2 years ago

no particular feeling about it but I would've left them in and hoped the ica deals with it