bnicenboim / eeguana

A package for manipulating EEG data in R.
https://bnicenboim.github.io/eeguana/
Other
21 stars 9 forks source link

Transformation of the channel_dbl() class are very slow. #140

Closed jaromilfrossard closed 3 years ago

jaromilfrossard commented 3 years ago

Hello,

It seems that some "simple" transformations are very slow. My first investigation suggests that it comes from the channel_dbl(.) class.

You will find below some simple examples where I center the signals of each trial: If I keep the channel_dbl() class, it takes around 60s, and if I extract the signal_tbl(), transform the channel_dbl() to as.numeric() it take only 1-2s (for centering the signals of each trial).

Maybe I miss something important, but my first hypothesis is that there is loss of efficiency with the channel_dbl() class.

Thanks you for creating eeguana!

Jaromil

For this example, I use the data from: https://osf.io/ut7xq/

devtools::install_github("bnicenboim/eeguana")
library(tidyverse)
library(eeguana)

eeg <- readRDS("data/preproc_ica/0.RDS")

### keeping the eeg object
time <- proc.time()
eeg%>%
  group_by(.id)%>%
  mutate_at(channel_names(.),function(x){x-mean(x)})
time <- rbind(time,proc.time())

### extracting only the signals
eeg%>%
  signal_tbl()%>%
  group_by(.id)%>%
  mutate_at(vars(Fp1:VEOG),function(x){x-mean(x)})
time <- rbind(time,proc.time())

### extracting the signals and transformation to as.numeric() is much faster
eeg%>%
  signal_tbl()%>%
  mutate_at(channel_names(.),as.numeric)%>%
  group_by(.id)%>%
  mutate_at(vars(Fp1:VEOG),function(x){x-mean(x)})
time <- rbind(time,proc.time())

### the timing of the 3 methods
apply(time,2,diff)
bnicenboim commented 3 years ago

oh, waw, good finding. There must be some validation running too many times, I'll check what's going on.

bnicenboim commented 3 years ago

I'm still doing some changes to speed things further, but in the meanwhile you can use the experimental branch:

devtools::install_github("bnicenboim/eeguana", ref ="experimental")

It takes now 3 seconds, there is a little overhead for the bookkeeping of the entire eeglist object. I might be able to speed things up even more with some tricks from the tidytable package.

Let me know how it goes. (By the way, there are some minor changes in the names of some arguments of the functions, check the help when in doubt, the website doesn't match the experimental version).

jaromilfrossard commented 3 years ago

Thanks!

I just installed the experimental branch and re-run the script above. I found some strange results:

The modification of the eeg_lst is much faster (only a few seconds). The modification of the signal_tbl is still slow (more than 60s) when using the channel_dbl column. The modification of the signal_tbl is still fast when converting to as.numeric.

Do you have the same results?

I think there is still a loss somewhere as the modification of the signal_tbl should not be slower that the modification of eeg_lst.

Thanks again for the quick modification!

Jaromil

bnicenboim commented 3 years ago

The modification of the signal_tbl is still slow (more than 60s) when using the channel_dbl column.

Here you're working outside of eeguana, since you extracted a "data.table" and that's dplyr that is slow and can't deal well with attributes. egguana's mutate actually relies on data table and not on dplyr. You should always work with the entire object and not with the different tables if you want to take advantage of eeguana's speed and memory management.

jaromilfrossard commented 3 years ago

OK. I understand (almost!). Thanks again!

bnicenboim commented 3 years ago

ok, if you are unsure about the fastest way of doing something, just ask. (And do report things that are too slow)