Multithreading problem caused by FFTW dependency

VPetukhov commented 4 years ago

tl;dr if you use many threads, running FFTW.set_num_threads(1) can be a good idea. Otherwise FFTW probably slows down computation and prevents outer parallelism. I suggest adding it to the README.

Full explanation I was trying to do a lot of KDE in the loop, but it occurred that running the code in parallel slow down the process. Even if I simply set JULIA_NUM_THREADS=20 (for 56 core server) without using @threads:

using KernelDensity
using Base.Threads

interp_kde(coords::Array{Float64, 2}, bandwidth::Float64) =
    InterpKDE(kde((coords[1,:], coords[2,:]), bandwidth=(bandwidth, bandwidth)))

td = rand(2, 100000);
@time for i in 1:500
    interp_kde(td, 1.0)
end

It creates multiple threads with loading 30% and takes 15.9 seconds. The same code with JULIA_NUM_THREADS=1 takes 7.5 seconds, working fairly in single thread. Timing doesn't really change if I use `@threads:

@time @threads for i in 1:500
    interp_kde(td, 1.0)
end

After some digging, the problem occurred to be in the FFTW package, which is called somewhere during interpolation and by default uses nthreads() * 4 threads inside its C code. To disable it you need to run FFTW.set_num_threads(1). After that, running with JULIA_NUM_THREADS=20 but without @threads takes 7.5 seconds, as it should be, and with @threads it takes 0.5 seconds.

I was trying different run configurations, but at the end, looks like having FFTW parallel improves situation comparing to single thread only with large arrays (>500000) and large number of iterations (>100) And it's always much worse than having outer loop parallel.

andreasnoack commented 4 years ago

Which version of Julia and the package are you using?

VPetukhov commented 4 years ago

Julia 1.3.1, FFTW v1.2.0, KernelDensity v0.5.1

andreasnoack commented 4 years ago

@stevengj Any thoughts on this?

JuliaStats / KernelDensity.jl

Multithreading problem caused by FFTW dependency #80