JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.41k stars 5.45k forks source link

Major slowdown on Linux due to Gtk.jl + `@threads for` #35552

Open timholy opened 4 years ago

timholy commented 4 years ago

xref https://github.com/JuliaGraphics/Gtk.jl/issues/503, which seems likely to be a Julia issue. Note that I observe it on both Julia 1.4 and 1.2, but Julia 1.0.5 does not share this problem.

JeffBezanson commented 4 years ago

Could you grab a profile? Some of the new threading and task code was added in 1.2, so that makes sense.

timholy commented 4 years ago

Bizarrely, on a relatively recent master (8d8f46db37f6e35632df32417f24fe275aca8061) it's always fast if I use @profile:

julia> @profile test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.019901 sec
2 imfilter 0.029619 sec
3 imfilter 0.021065 sec
4 imfilter 0.01536 sec
5 imfilter 0.013495 sec
6 imfilter 0.009861 sec
7 imfilter 0.010458 sec
8 imfilter 0.009338 sec
9 imfilter 0.020186 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 5.022134 sec
2 imfilter 0.034472 sec
3 imfilter 5.020989 sec
4 imfilter 5.032051 sec
5 imfilter 0.036996 sec
6 imfilter 0.017211 sec
7 imfilter 0.018399 sec
8 imfilter 0.01314 sec
9 imfilter 0.013678 sec

julia> @profile test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.029225 sec
2 imfilter 0.010284 sec
3 imfilter 0.008963 sec
4 imfilter 0.009983 sec
5 imfilter 0.007118 sec
6 imfilter 0.012708 sec
7 imfilter 0.009843 sec
8 imfilter 0.009938 sec
9 imfilter 0.010068 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.027196 sec
2 imfilter 5.015825 sec
3 imfilter 0.032341 sec
4 imfilter 5.017424 sec
5 imfilter 0.02489 sec
6 imfilter 5.017529 sec
7 imfilter 5.018003 sec
8 imfilter 0.019033 sec
9 imfilter 5.024461 sec

If I use a low-tech sampling profiler (hit Ctrl-C and see where the break occurs), it always seems to be here:

ERROR: InterruptException:
Stacktrace:
 [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./task.jl:645
 [2] wait at ./task.jl:710 [inlined]
 [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106
 [4] _wait(::Task) at ./task.jl:238
 [5] wait(::Task) at ./task.jl:265
 [6] macro expansion at ./threadingconstructs.jl:69 [inlined]
 [7] _imfilter_tiled_threads!(::CPU1{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::NoPad{Pad{2}}, ::Array{Tuple{UnitRange{Int64},UnitRange{Int64}},1}, ::Array{Array{Float32,2},1}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:858
 [8] _imfilter_tiled!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Array{Array{Float32,2},1}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:854
 [9] imfilter!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:773
 [10] imfilter! at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:770 [inlined]
 [11] imfilter! at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:715 [inlined]
 [12] imfilter!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:702
 [13] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}, ::ImageFiltering.Algorithm.FIRTiled{2}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:612
 [14] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:606
 [15] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:27 [inlined]
 [16] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:22 [inlined]
 [17] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:18 [inlined]
 [18] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:10 [inlined]
 [19] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:5 [inlined]
 [20] test_imfilter(; opt::Int64) at ./REPL[4]:11
 [21] top-level scope at REPL[23]:1

The threading is just old-school @threads, see here.

JeffBezanson commented 4 years ago

If I use a low-tech sampling profiler (hit Ctrl-C and see where the break occurs),

One of my favorites! :smile:

JeffBezanson commented 4 years ago

The key difference in v1.2 must be that @threads loops join at the end by calling the normal wait on tasks, which allows other scheduled tasks to run as well. Does GTK have a task scheduled that might block for 5 seconds?

lobingera commented 4 years ago

What do you mean?

a) Is it possible that GTK spends 5s on internal processing i.e. redrawing something? b) Is is possible the GTK calls something that takes 5s?

I mentioned this https://github.com/JuliaGraphics/Gtk.jl/issues/325#issuecomment-396952420 , just putting the gtk main loop into a Task might not be enough. There seems to be a reason for this page https://developer.gnome.org/gtk3/stable/gtk3-General.html

timholy commented 4 years ago

I am no Gtk expert, but I don't think there's anything in Gtk that should take that long. Assuming that times are being measured in milliseconds, there is something which appears to be a 5s timeout. See also here. I confess that I have not dug into libuv or Julia's scheduling enough to make much sense of what's going on there.

roflmaostc commented 3 years ago

Hey,

I was wondering whether there are some efforts being done to fix this? I encountered (probably?) the same issue in mcabbott/Tullio.jl#45 and Julia 1.5.2. But for me that 5s don't appear. It's just a slowdown.

Thanks,

Felix

rafaqz commented 3 years ago

I just got bitten by this as well. It's always roughly a 5 second timeout in @threads after using Gtk.

It seems like it's literally a timeout - there is nothing significant happening on any CPUs for the duration, and the timing seems too similar across multiple machines to be an actual task.

wavecast commented 3 years ago

Just confirming what @timholy mentioned above. I tried changing 5000 to 1 in this line of Gtk.jl/src/GLib/signals.jl:

  tmout_min::Cint = (uv_pollfd::_GPollFD).fd == -1 ? 10 : 5000

After doing that a multithreaded test program (just https://github.com/JuliaImages/ImageView.jl/issues/243 with 'using ImageView' changed to 'using Gtk') runs much faster. Changing 5000 to 0 makes it even faster but CPU goes to 100%.

IanButterworth commented 2 years ago

This gets in the way of the new Profiling thread support given ProfileView uses Gtk.

So if a user does

julia> using ProfileView

julia> @profview multithreaded_foo()

they're going to have a bad time.

It can be worked around once, by doing

julia> using Profile

julia> @profile multithreaded_foo()

julia> using ProfileView

julia> ProfileView.view()

but that's awkward and only works once per session.

Therefore as a suggestion I'm putting it on the 1.8 milestone

IanButterworth commented 2 years ago

https://github.com/JuliaGraphics/Gtk.jl/pull/607 is merged and released, which makes this a bit more manageable by stopping the eventloop when no Gtk windows are shown

tkf commented 2 years ago

I tried to refine the issue title as it looks like this is specific to @threads for and has nothing/little to do with our multi-threaded runtime itself, IIUC. But please feel free to roll it back if that's inaccurate.

roflmaostc commented 1 year ago

Hm.

I have kind of bad news. In my case I observe the major slowdown again.

I don't have a MWE yet, but after using ImagView and imshow(x) I observe significant performance drop in my iterative FFT algorithm. CPU usage drops from 4 full used threads to 50% on one core.

roflmaostc commented 1 year ago

Ok, found one, the difference is roughly a factor of 30:

julia> using FFTW

julia> FFTW.set_num_threads(4)

julia> x = rand(300, 300);

julia> using BenchmarkTools

julia> @btime fft($x);
  349.117 μs (111 allocations: 2.75 MiB)

julia> using ImageView

julia> @btime fft($x);
  361.026 μs (111 allocations: 2.75 MiB)

julia> imshow(x)
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALS…
  "roi"         => Dict{String, Any}("redraw"=>Observables.ObserverFunction[ObserverFunction defined at /home/fxw/.julia/packages/Observables/PHGQ8/src/Observables.jl:414 operating…
  "annotations" => Observable(Dict{UInt64, Any}())
  "clim"        => Observable(CLim{Float64}(7.34195e-6, 0.999971))

julia> @btime fft($x);
  10.860 ms (112 allocations: 2.75 MiB)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600X 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 12 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 12

(jl_Qyr6ui) pkg> st
Status `/tmp/jl_Qyr6ui/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [7a1cc6ca] FFTW v1.5.0
  [86fae568] ImageView v0.11.1

For larger array, the difference seems to be smaller.

julia> using BenchmarkTools, FFTW

julia> x = rand(512, 512, 128);

julia> @btime fft($x);
  1.152 s (31 allocations: 1.00 GiB)

julia> FFTW.set_num_threads(4)

julia> @btime fft($x);
  408.174 ms (113 allocations: 1.00 GiB)

julia> using ImageView

julia> @btime fft($x);
  408.752 ms (112 allocations: 1.00 GiB)

julia> imshow(x)
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-reque…
  "roi"         => Dict{String, Any}("redraw"=>Observables.ObserverFunction[ObserverFunct…
  "annotations" => Observable(Dict{UInt64, Any}())
  "clim"        => Observable(CLim{Float64}(6.91704e-6, 0.999992))

julia> @btime fft($x);
  422.323 ms (116 allocations: 1.00 GiB)