JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.45k stars 5.46k forks source link

inspectdr trashes threading performance #40535

Open mattcbro opened 3 years ago

mattcbro commented 3 years ago

This is a rather obscure but difficult to find interaction between the Threads library and the inspectdr() backend for the Plots library. I have some code that attempts to use a simple @threads for loop to parallelize multiple QR decompositions of a block of raw data.

Calling the inspectdr() initializer will cause the threaded code to run anywhere from 200 to 7000 times slower in this example. The difference for the genprojmat() function timings can be seen by commenting and uncommenting the inspectdr() line. I don't do any plotting in this script.

I'm on linux mint 20.1, 8 core processor. The code follows:

# test simple thread idea
using Plots
# uncommenting inspectdr() causes the threaded version of genprojmat to run up to 7000 times slower
inspectdr()
using LinearAlgebra
using BenchmarkTools

#using QThread
##
""" A matlab version of Julias QR decomposition.
        Q,R = flatqr(X) """
function flatqr(X)
     F = qr(X)
     return(Matrix(F.Q), F.R)
end

""" Circularly symmetric Complex noise """
function cgauss(varargin...)

   Z = (1 ./ sqrt(2.)).* (randn(varargin)+im.*randn(varargin))
end

""" complex zeros functions since I forget how to write the type signatures """
function czeros(x...)
    y = zeros(Complex{Float64}, x) ;
    return(y)
end # function czeros

""" Generate all the projection matrices.   Use threaded loop to exploit embarassing parallel problem """
function genprojmat(xdata)
    Mants, Nf, Ns = size(xdata)
    Qall = czeros(Ns, Mants, Nf)
    Threads.@threads for k = 1:Nf
        Qx, Rx = flatqr(xdata[:,k,:]')
        Qall[:, :, k] = Qx
    end
    return(Qall)
end

""" Generate all the projection matrices.  Non threaded case  is much faster.  Why? """
function genprojmatnt(xdata)
    Mants, Nf, Ns = size(xdata)
    Qall = czeros(Ns, Mants, Nf)
    for k = 1:Nf
        Qx, Rx = flatqr(xdata[:,k,:]')
        Qall[:, :, k] = Qx
    end
    return(Qall)
end
##

# Set up the input data
Mants =4
Nf = 24
nsgn = 0.1
apr = cgauss(4)
chan = cgauss(Nf)
ac = apr * transpose(chan)

#    Mants, Nf, Ns = size(xdata)
Ns = 1024
K = 3
xdata = nsgn .* cgauss(Mants, Nf, Ns)
st = randn(Ns)
for q=1:Ns
    xdata[:,:,q] = xdata[:,:,q] + ac .* st[q]
end

# threaded version
@btime genprojmat(xdata) ;

# not threaded version
@btime genprojmatnt(xdata) ;

The output of running the script, first with inspectdr() uncommented:

matt@Hope /mnt/WorkSpace/projects/Maestro/QThread $ julia -t 8
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0 (2021-03-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("testthreaded.jl")
Gtk-Message: 16:20:03.457: Failed to load module "xapp-gtk3-module"
  5.003 s (332 allocations: 7.54 MiB)
  2.380 ms (290 allocations: 7.54 MiB)
1024×4×24 Array{ComplexF64, 3}: .....
julia> Threads.nthreads()
8

Now with inspectdr() commented out and after restarting julia.

matt@Hope /mnt/WorkSpace/projects/Maestro/QThread $ julia -t 8
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0 (2021-03-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> Threads.nthreads()
8

julia> include("testthreaded.jl")
  705.049 μs (331 allocations: 7.54 MiB)
  2.547 ms (290 allocations: 7.54 MiB)
1024×4×24 Array{ComplexF64, 3}:
jpsamaroo commented 3 years ago

You can try @code_warntype genprojmat(xdata) with and without inspectdr() called to see if you're accidentally including a global variable that you didn't intend to, or if someone odd is happening in Threads.@threads. If the output differs, then that should be the sign that inspectdr() is doing something at the top-level that causes the threaded loop to be optimized more poorly.

Aside: Do you have a reason to suspect this is the fault of Julia, Base, or a Stdlib? If not, I don't think you need to file an issue here; you already have one filed on InspectDR.jl.

mattcbro commented 3 years ago

I presume that the probability of it being the fault of Threads.@threads is low but not zero. Nevertheless in various code configurations the only change that causes the massive slowdown in @threads performance is the call to inspectdr(). Moreover once it's called, even if you do no plotting, @threads is damaged until julia is restarted.

My code contains no global variables outside of those defined in the REPL. Furthermore I've tested the same code with all the function definitions being inside a separate module. genprojmat() has no global variables. Any such variables would have to be in @threads or in inspectdr(), whose internals I know nothing about.

KristofferC commented 3 years ago

Looks similar to https://github.com/JuliaGraphics/Gtk.jl/issues/503.

mattcbro commented 3 years ago

@KristofferC It really does look similar. I wonder if using ThreadPools@bthreads mitigates the problem. I'll have to give that a try when I get a chance.