Open maximerischard opened 4 years ago
I think there should be a threshold that decides whether to use simd + threads
or only simd
. I doubt that threads will give good performance even with a small number of observations.
Also, did you consider using tasks? The task scheduling system is not yet in Julia but we could consider if there are some performance benefits there.
Hi @Red-Portal, thank you for having a look into this. You're right to point out that there should be a threshold to determine whether to use threads if there is some overhead in doing so. I'll do some more benchmarks to check if it's necessary. The benchmarks above show that threads do make a big difference to performance.
My understanding of tasks (coroutines) is that they solve a different problem: parallelising IO-bound tasks. Here multithreading is used to simultaneously compute multiple elements of the covariance matrix (and gradients). There is no communication or coordination between threads, so the overhead is low.
Hi @maximerischard , yes as you said, tasks are used for IO bound operations. However they are also used to implement fine-grained parallelism. Julia's planned depth-first scheduling is targeted towards parallelism (not concurrency).
The performance improvements are wonderful BTW.
I've been reading a bit more about tasks and multithreading in julia. What I've done in this PR is use multithreading for the functions that iterate over the entries of an array. It's straightforwardly parallelisable, with no communication between threads, so low overhead. The overhead does seem to be non-zero, though when I benchmark it it's drowned in noise. Roughly speaking it seems to add 50ms to the benchmarks that take 1-2 seconds.
You had two suggestions @Red-Portal:
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
src/covariance/covariance.jl | 17 | 23 | 73.91% | ||
src/covariance/multithreaded.jl | 0 | 72 | 0.0% | ||
<!-- | Total: | 17 | 95 | 17.89% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
src/kernels/distance.jl | 1 | 87.5% | ||
src/kernels/mat32_ard.jl | 1 | 90.0% | ||
src/kernels/mat32_iso.jl | 1 | 90.0% | ||
src/kernels/mat52_ard.jl | 1 | 90.0% | ||
src/kernels/mat52_iso.jl | 1 | 90.0% | ||
src/kernels/noise.jl | 1 | 92.86% | ||
src/kernels/periodic.jl | 1 | 89.47% | ||
src/kernels/rq_ard.jl | 1 | 94.12% | ||
src/kernels/rq_iso.jl | 1 | 88.24% | ||
src/likelihoods/bernoulli.jl | 1 | 44.44% | ||
<!-- | Total: | 111 | --> |
Totals | |
---|---|
Change from base Build 563: | -8.4% |
Covered Lines: | 1371 |
Relevant Lines: | 2022 |
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
src/covariance/covariance.jl | 17 | 23 | 73.91% | ||
src/covariance/multithreaded.jl | 0 | 72 | 0.0% | ||
<!-- | Total: | 17 | 95 | 17.89% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
src/kernels/poly.jl | 1 | 80.65% | ||
<!-- | Total: | 1 | --> |
Totals | |
---|---|
Change from base Build 617: | -3.1% |
Covered Lines: | 1460 |
Relevant Lines: | 1987 |
Single-threaded benchmark (
JULIA_NUM_THREADS=1
):Multi-threaded benchmark (
JULIA_NUM_THREADS=10
):I would be quite keen to get feedback from anyone who's worked with multi-threading in julia before.