gjmvanboxtel / gsignal

R implementation of the Octave signal package
GNU General Public License v3.0
22 stars 6 forks source link

parallelizing `filter` (and other functions?) #8

Closed bnicenboim closed 2 years ago

bnicenboim commented 2 years ago

Hi I think that (at least) the filter functions might benefit from parallelizing the processes with foreach. I've been peaking at the code, and the .Call functions which do the heavy lifting are inside for-loops. I'm curious if you have already tried this?

gjmvanboxtel commented 2 years ago

Hi,

Yes, I am sure you are right. The gsignal package was not developed with speed in mind, but I realize that this might be something to take into account for future versions. I have done some tests with parallelized code, but they are many design choices to be made, and I do not think myself experienced enough to make those choices at this time.

Having said that, the filter function (and functions that use it), are not terribly slow either, and probably fast enough for many applications. Due to overhead issues, a benefit of parallelized code can only be expected for very large matrices, at least that is what my (limited) simulations have shown. You could also add a parallelization wrapper for processing single columns. In the following code, I simulated 10 minutes of EEG data sampled at 256 Hz on 256 channels, and there you see an advantage of using parApply (I am on Linux with an Intel I7 with 8 cores):

library(gsignal)
library(parallel)
library(doParallel)
library(foreach)
library(microbenchmark)

# create a matrix containing 256 channels each containing (the same) 10 minutes of EEG sampled at 256 Hz
# use replicated signals$eeg
nchan <- 256
fs <- 256
nsecsin <- 10
nsecsout <- 900
chan <- rep(signals$eeg, nsecsout / nsecsin)
d <- matrix(0, length(chan), nchan)
d <- apply(d, 2, function(x) x <- chan)

# create a lowpass filter at 10 Hz
but <- butter(5, 10 / (fs / 2), "low")
freqz(but, fs = fs)

# benchmark ordinary serial execution
serial <- microbenchmark(f <- filter(but, d),
                         times = 10L)

# benchmark serial execution using apply
using_apply <- microbenchmark(f <- apply(d, 2, function(x) filter(but, x)),
                              times = 10L)

# benchmark parallel execution using foreach
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl) # register the cluster
using_foreach <- microbenchmark({
  f <- foreach(i = 1:ncol(d), .combine = cbind) %dopar% gsignal::filter(but, d[, i])
}, times = 10L)
stopCluster(cl)

# benchmark parallel execution using parApply
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, varlist = c("but"))
using_parApply <- microbenchmark({
  f <- parApply(cl, d, 2, function(x) gsignal::filter(but, x))
}, times = 10L)
stopCluster(cl)

> serial
Unit: seconds
                expr      min       lq     mean  median       uq      max neval
 f <- filter(but, d) 6.897315 7.183626 7.472786 7.28474 7.807517 8.526827    10
> using_apply
Unit: seconds
                                         expr      min       lq    mean   median       uq      max neval
 f <- apply(d, 2, function(x) filter(but, x)) 7.933266 8.038973 8.17494 8.130569 8.357475 8.433421    10
> using_foreach
Unit: seconds
                                                                                                expr    min       lq     mean   median
 {     f <- foreach(i = 1:ncol(d), .combine = cbind) %dopar% gsignal::filter(but,          d[, i]) } 11.462 11.83986 11.91627 11.96393
       uq      max neval
 12.08514 12.13378    10
> using_parApply
Unit: seconds
                                                                          expr      min       lq     mean   median       uq      max neval
 {     f <- parApply(cl, d, 2, function(x) gsignal::filter(but,          x)) } 4.263724 4.364573 4.448084 4.447184 4.541041 4.675006    10

Of course, further speed improvements can be obtained if filter would handle the loops internally instead of with a wrapper, perhaps in the C++ code, but that is something for the future, and I could also use some help from someone more experienced in parallel processing than I currently am.

Hope this helps, thank you. Best, Geert

bnicenboim commented 2 years ago

I see! I didn't know that parApply was so much faster than foreach. I have recordings that are one hour long, so I guess it will play a role there.

I'm not an expert in parallelizing code either, maybe these people https://www.esciencecenter.nl/calls-for-proposals/open-escience-call-2022/ can help?

But feel free to close, you're right that I can parallelize the code in my end.