craddm / eegUtils

An R package for processing and plotting of electroencephalography (EEG) data
https://craddm.github.io/eegUtils/
Other
105 stars 27 forks source link

Running out of memory when combining large dataset #139

Closed mamuerie closed 1 year ago

mamuerie commented 1 year ago

Hi, thanks a lot for your effort creating this package - I love to be able to use R instead of matlab for my eeg analysis.

I have a dataset which was preprocessed in BrainVision Analyzer and I exported the clean, segmented data as vhdr files per person (70 participants). Importing all the data files and epoching them with epoch_data (and purrr:map) works perfectly fine and results in a large list (24.5 GB). But when I combine all the participants to one dataset with eeg_combine, R throws an memory related error "cannot allocate vector of size xy". Do you have any idea how I could minimize the size of the data or smartly combine the functions to prevent the memory problem? I was wondering if it is possible to only store the epoched data and not both the eeg_data and the eeg_epochs?

craddm commented 1 year ago

It depends a little what you're intending to do. You probably don't actually want the single trial data for every participant, for every timepoint, for every electrode to be combined into one big dataset. There are not many use cases for that - like, maybe you want to run a linear mixed-effects model on every electrode and every timepoint, but you probably don't want to do that!

A couple of things you could do: 1) Is this for an ERP study? You likely just need ERPs for each condition for each participant, in which case you'd be better off using eeg_average() to get eeg_evoked() objects from the eeg_epochs() objects, and then combine then into a grand average. That just gives you a single ERP per condition (you can specify which conditions you want when you run eeg_average(), but by default it'll try to give you one ERP per unique combination of conditions). That should take up a whole lot less memory. 2) If you do need single trial data, consider trimming the data for each participant down to only what you really need. e.g. you may only need a subset of electrodes and the data for a specific time window.

NB In the develop branch, eeg_average() that keeps track of the number of epochs for each condition when averaging, allowing you to create weighted averages rather than unweighted averages, which might also be something to consider (I suspect the field often overlooks the merits of weighted averages).

mamuerie commented 1 year ago

Thanks a lot for this answer! Yes, this is an ERP study, I will try to use eeg_average before merging, or I will come back to R with more specific hypothesis about the electrode location or time window and create the more specific plots later :). The hint about the weighted average is great - thanks for that!