alexeckert / parallelDist

R Package: Parallel Distance Matrix Computation using Multiple Threads
GNU General Public License v2.0
49 stars 9 forks source link

multi-threaded dtw parDist crashes on curves with differing lengths #7

Closed fdeon closed 5 years ago

fdeon commented 5 years ago

Hello Alex, thanks for the nice package.

I am trying to compute a matrix of DTW distances between 1d sequences with unequal lenghts. I'm passing the signals as a list of (1 by len_i) matrices.

However running parDist with "threads" > 1 most of the times crashes the R session. It does not crash if the len_i are all equal.

Funnily enough, I seem to find that: (1) it invariably crashes when running on a "fresh" R session (2) it sometimes does run (on signals with unequal lengths) in a session in which I have previously run the function with equal-length signals of about the same size. (some memory allocation issue maybe?)

Also, I can't find in the documentation how the function deals with slope-limited "step.pattern"s (e.g. symmetricP1), when for some pairs of signals happen to have length ratio above / below the max local slope.

I'm adding a little code snippet that reproduces the issue on my system.

library(dtw)
library(parallelDist)

# Generate N randomly warped sine-waves
N <- 50
lmin <- 200  # min signal length
lmax <- 500  # max signal length

warp.coefs <- runif(N, -1, 1)
if(lmax - lmin){
    sig.lens <- sample(c(lmin:lmax), N, replace=T)
}else{
    sig.lens <- rep(lmin, N)
}
signals <- list()
for (i in 1:N){
    l <- sig.lens[i]
    a <- warp.coefs[i]
    x <- seq(0, 1, length.out=l)
    x.wpd <- x + a*x*(1 - x)
    y <- sin(2*pi * 3*x.wpd)
    signals[[i]] <- matrix(y, nrow=1, ncol=l)
}

# parDist align
dist.obj <- parDist(signals, method='dtw', step.pattern="symmetricP1", threads=2)

Thanks for any help you might provide. Regards,

Fabio

alexeckert commented 5 years ago

Thanks for your reproducible code sample, will have a look at it at the weekend.

alexeckert commented 5 years ago

Fixed a bug which could cause wrong DTW distance calculation for matrices of different length when using multiple threads: https://github.com/alexeckert/parallelDist/commit/f2676e065cc32fb852020dfd8cbfa2439f238566

The new version is not available at CRAN, yet. It can be installed via install_github:

library(devtools)
install_github("alexeckert/parallelDist")
fdeon commented 5 years ago

It's working correctly now. Thanks for your very quick reply.