JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.45k forks source link

Unexpected `sleep` function behavior in relation to main thread activity in multithreading context #50643

Open algunion opened 1 year ago

algunion commented 1 year ago

It seems that the sleep function behavior when used in tasks running on different threads is heavily impacted by work taking place on the main thread.

Given the documentation of sleep function you would expect that @time sleep(1) will result in approximately 1-second being reported by the @time evaluator. In the same way, you would expect that the following will take about 10-seconds in total:

@time for _ in 1:10
    sleep(1)
end

# Usually + MWE control (task1): 10 seconds +/- small value
# MWE context (current issue): 16+ seconds (can be altered by the period main thread is busy)

However, the above expectation is violated (in a manner not explainable by the expected/usual variability in the sleep and/or @time behavior) when sleep is called from tasks running on different threads and any of the following two conditions are met:

  1. there is some work performed by the main thread, or Libc.systemsleep(n) is called on the main thread (showcased in my MWE)
  2. the main thread is free, but many (~1000) spawned tasks call sleep concurrently (increasing the number of available threads results in worse results).

To give a clear example:

function sleeper(s, c, id)
    @info "sleeper $id executing on thread $(threadid()) "
    counter = 0
    @time "sleeper $id" while counter < c
        sleep(s)
        counter += 1
    end
end

@spawn sleeper(1, 10, 1)

Output like this would look normal:

sleeper 1: 10.021883 seconds (53.10 k allocations: 3.561 MiB)

However, the function sleeper(1, 10, 1) running in tasks spawned on different threads than main, can output values like the one below (when main thread is busy):

sleeper 1: 16.963197 seconds (13.87 k allocations: 957.833 KiB, 0.13% compilation time)

Or if you make the main really busy for a longer period of time:

sleeper 1: 29.944492 seconds (13.87 k allocations: 957.802 KiB, 0.06% compilation time)

And this is not about some weird print to stdout competition between tasks (we also have a control function, realworker, not using sleep to confirm my statement).

All this might make more sense if you run the MWE provided below.

Important note: the current issue is to be interpreted in the context of Julia being started with multiple threads. For consistency, let's assume julia -t 6,3 script.jl.

Main MWE follows:

using Base.Threads, Dates

@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))

function mainspin(s, c; withyield=false)
    @info "mainspin executing on thread $(threadid()) "
    ms = Millisecond(s * 1000)
    r = 0.0
    counter = 0
    while counter < c
        t = now()
        while (now() - t) < ms
            r += rand()
        end
        counter += 1
        if withyield
            yield()
        end

    end
end

function sleeper(s, c, id)
    @info "sleeper $id executing on thread $(threadid()) "
    counter = 0
    @time "sleeper $id" while counter < c
        sleep(s)
        counter += 1
    end
end

function realworker(s, c, id)
    @info "worker $id executing on thread $(threadid()) "
    ms = Millisecond(s * 1000)
    r = 0.0
    counter = 0
    @time "worker $id" while counter < c
        t = now()
        while (now() - t) < ms
            r += rand()
        end
        counter += 1
        yield()
    end
end

begin
    # phase 1
    @info "phase 1: control run - no main thread work"

    # yields 10 times (total time ~10 seconds)
    task1 = @spawn realworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
    # yields 10 times (total time ~10 seconds)
    task11 = @spawn sleeper(1, 10, 1)

    wait(task1)
    wait(task11)

    # we waited for task1 and task11 to finish

    # phase 2
    println(repeat("*", 40))
    @info "phase 2: main thread work without yield"

    # yields 10 times (total time ~10 seconds)
    task2 = @spawn realworker(1, 10, 2)

    # yields 10 times (total time ~10 seconds)
    task22 = @spawn sleeper(1, 10, 2)

    # we wait 5 seconds before starting to spin
    # the main thread
    sleep(5)

    # now we start spinning 
    # while task2 and task22 are already running
    mainspin(1, 7)

    # main thread work does not impact the real worker
    # total time ~10 seconds
    wait(task2)

    # main thread work heavily impacting the sleeper
    # total time >15 seconds
    wait(task22)

    # phase 3
    println(repeat("*", 40))
    @info "phase 3: main thread work with yield"

    # yields 10 times (total time ~10 seconds)
    task3 = @spawn realworker(1, 10, 3)

    # yields 10 times (total time ~10 seconds)
    task33 = @spawn sleeper(1, 10, 3)

    # we wait 5 seconds before starting to spin
    # the main thread
    sleep(5)

    # now we start spinning with yield
    # while task3 and task33 are running
    mainspin(1, 7, withyield=true)

    # main thread work does not impact 
    # the real worker - total time ~10 seconds
    wait(task3)

    # main thread work (with yield) still impacting the sleeper
    # but we consistently get better times (total time ~13 seconds)
    wait(task33)
end

Output with julia -t 6,3 script.jl:

[ Info: started with 6 threads
[ Info: interactive threadpool: 3
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 4 
[ Info: sleeper 1 executing on thread 8 
sleeper 1: 10.020176 seconds (51 allocations: 1.438 KiB)
worker 1: 10.199500 seconds (53.62 k allocations: 3.704 MiB, 0.36% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: worker 2 executing on thread 7 
[ Info: sleeper 2 executing on thread 5 
[ Info: mainspin executing on thread 1 
worker 2: 9.999656 seconds (47.22 k allocations: 3.243 MiB, 0.24% compilation time)
sleeper 2: 16.974156 seconds (13.87 k allocations: 957.802 KiB, 0.14% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: worker 3 executing on thread 9 
[ Info: sleeper 3 executing on thread 5 
[ Info: mainspin executing on thread 1 
worker 3: 9.999733 seconds (37.05 k allocations: 2.618 MiB, 0.23% compilation time)
sleeper 3: 13.979589 seconds (2.67 k allocations: 249.790 KiB, 0.16% compilation time)

It would be speculative on my part to derive hard conclusions about the root cause of this issue. There is a discourse topic that I open and some users are proposing some hypotheses - feel free to check it out.

The underlying cause of the above issue seems to extend to various packages (for example, non-blocking listen from HTTP.jl responsivity is highly dependent on the main threat not doing work - and that is not caused by the direct sleep usage). Essential to mention that HTTP.serve!/HTTP.listen! is spawned to the :interactive threadpool.

Libc.systemsleep MWE version

The `Libc.systemsleep` version of the above MWE ```julia using Base.Threads, Dates @info "started with $(nthreads()) threads" @info "interactive threadpool: $(nthreads(:interactive))" @info "default threadpool: $(nthreads(:default))" println(repeat("*", 40)) function mainspin(s, c; withyield=false) @info "mainspin executing on thread $(threadid()) " r = 0.0 counter = 0 while counter < c Libc.systemsleep(s) counter += 1 if withyield yield() end end end function sleeper(s, c, id) @info "sleeper $id executing on thread $(threadid()) " counter = 0 @time "sleeper $id" while counter < c sleep(s) counter += 1 end end function libcworker(s, c, id) @info "worker $id executing on thread $(threadid()) " r = 0.0 counter = 0 @time "worker $id" while counter < c Libc.systemsleep(s) counter += 1 yield() end end begin # phase 1 @info "phase 1: control run - no main thread work" # yields 10 times (total time ~10 seconds) task1 = @spawn libcworker(1.02, 10, 1) # 1.02 because weird @time printing overlap # yields 10 times (total time ~10 seconds) task11 = @spawn sleeper(1, 10, 1) wait(task1) wait(task11) # we waited for task1 and task11 to finish # phase 2 println(repeat("*", 40)) @info "phase 2: main thread work without yield" # yields 10 times (total time ~10 seconds) task2 = @spawn libcworker(1, 10, 2) # yields 10 times (total time ~10 seconds) task22 = @spawn sleeper(1, 10, 2) # we wait 5 seconds before starting to spin # the main thread sleep(5) # now we start spinning # while task2 and task22 are already running mainspin(1, 7) # main thread work does not impact the real worker # total time ~10 seconds wait(task2) # main thread work heavily impacting the sleeper # total time >15 seconds wait(task22) # phase 3 println(repeat("*", 40)) @info "phase 3: main thread work with yield" # yields 10 times (total time ~10 seconds) task3 = @spawn libcworker(1, 10, 3) # yields 10 times (total time ~10 seconds) task33 = @spawn sleeper(1, 10, 3) # we wait 5 seconds before starting to spin # the main thread sleep(5) # now we start spinning with yield # while task3 and task33 are running mainspin(1, 7, withyield=true) # main thread work does not impact # the real worker - total time ~10 seconds wait(task3) # main thread work (with yield) still impacting the sleeper # but we consistently get better times (total time ~13 seconds) wait(task33) end ``` Output: ``` [ Info: started with 6 threads [ Info: interactive threadpool: 3 [ Info: default threadpool: 6 **************************************** [ Info: phase 1: control run - no main thread work [ Info: worker 1 executing on thread 4 [ Info: sleeper 1 executing on thread 5 sleeper 1: 10.021097 seconds (50 allocations: 1.406 KiB) worker 1: 10.202331 seconds (7.16 k allocations: 527.908 KiB, 0.89% compilation time) **************************************** [ Info: phase 2: main thread work without yield [ Info: sleeper 2 executing on thread 8 [ Info: worker 2 executing on thread 6 [ Info: mainspin executing on thread 1 worker 2: 10.001909 seconds (12.27 k allocations: 864.121 KiB, 0.36% compilation time) sleeper 2: 17.009272 seconds (28.83 k allocations: 1.998 MiB, 0.21% compilation time) **************************************** [ Info: phase 3: main thread work with yield [ Info: sleeper 3 executing on thread 5 [ Info: worker 3 executing on thread 7 [ Info: mainspin executing on thread 1 worker 3: 10.001954 seconds (2.62 k allocations: 246.360 KiB, 0.28% compilation time) sleeper 3: 13.026204 seconds (30.55 k allocations: 2.179 MiB, 0.22% compilation time) ```
vtjnash commented 1 year ago

sleep has no upper bound on timeout (unless you are running a realtime kernel on custom hardware)

algunion commented 1 year ago

sleep has no upper bound on timeout (unless you are running a realtime kernel on custom hardware)

We can agree on that - but the mere not having an upper bound on timeout state of affairs doesn't seem to explain the behavior pointed out in the MWE I provided.

Please note that I am not questioning the equality @time sleep(x) == x seconds in some extreme or naive sense (e.g., sleep(x) should always take at most x seconds). I suggest looking at the output of task11 vs. task22 and task33 in the provided MWE.

From what I am observing, the issue I raised might have nothing to do with the actual duration of sleep(x) - it might very well be related to whether sleep calls are blocked/delayed if issued from tasks running on different threads when the main thread is busy (as you can see in the MWE - task11 with an idle main is performing as expected, but not task22 and task33 which are running while the main thread is busy).

I also acknowledge that your comment might have some technical implications I am unaware of.

P. S. I updated the initial issue body to decrease the chance of confusing this with some naive why doesn't sleep(5) take exactly 5 seconds issue.

vchuravy commented 1 year ago

Julia uses a libuv based event-loop under the hood. Processing certain things like Timers/IO depend on the event-loop being run regularly. (sleep uses a libuv timer under the hood.)

Looking at jl_process_events it seems like the event loop is only run from tid == 0 or when _threadedregion is set.

jl_enter_threaded_region is only called from threading_run (which is the base function for @threads).

I am unsure why we still have this mechanism instead of allowing any thread to run the libuv event loop.

algunion commented 1 year ago

I really hope we can run the libuv event loop from any thread soon.

Especially for the :interactive threadpool, this issue can be really annoying: for a non-blocking HTTP.jl server running on the :interactive threadpool, small bursts of work on the main thread can (always) delay and even deny the requests (depending on timeout).

But even if no work is done - spawning lots of tasks (like HTTP.jl does for new connections) can cause concurrent access to the libuv event loop from multiple threads - so this issue is pretty big regardless of being careful not to run non-trivial work on the main thread.

But more generally, the interactive tasks are heavily impacted by this if there is some Timer/IO work involved (and it is awkward always to be careful not to run anything on the main if responsivity of the :interactive threadpool is desired - well, that is always desired because that is the point of :interactive addition). But as I said - in a multithreading setup, if a large number of tasks are using Timer, this issue becomes visible even if no work is done on the main thread.

Let me know if more MWEs are needed in the above context.

lassepe commented 1 month ago

On master with the suggested snippet from https://github.com/JuliaLang/julia/pull/53422#issue-2147679408

julia> function make_io_thread()
           tid = UInt[0]
           threadwork = @cfunction function(arg::Ptr{Cvoid})
                   @ccall jl_set_io_loop_tid((Threads.threadid() - 1)::Int16)::Cvoid
                   wait() # spin uv_run as long as needed
                   nothing
               end Cvoid (Ptr{Cvoid},)
           err = @ccall uv_thread_create(tid::Ptr{UInt}, threadwork::Ptr{Cvoid}, C_NULL::Ptr{Cvoid})::Cint
           err == 0 || Base.uv_error("uv_thread_create", err)
           @ccall uv_thread_detach(tid::Ptr{UInt})::Cint
           err == 0 || Base.uv_error("uv_thread_detach", err)
           # n.b. this does not wait for the thread to start or to take ownership of the event loop
           nothing
       end
       make_io_thread()

the MWE from above produces:

[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 5
[ Info: sleeper 1 executing on thread 16
sleeper 1: 10.033890 seconds (109.12 M allocations: 6.504 GiB, 2.50% gc time)
worker 1: 10.213286 seconds (110.98 M allocations: 6.615 GiB, 2.53% gc time, 0.13% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: worker 2 executing on thread 5
[ Info: sleeper 2 executing on thread 15
[ Info: mainspin executing on thread 1
worker 2: 10.012252 seconds (119.87 M allocations: 7.145 GiB, 3.50% gc time, 0.13% compilation time)
sleeper 2: 10.024603 seconds (119.98 M allocations: 7.151 GiB, 3.51% gc time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: worker 3 executing on thread 2
[ Info: sleeper 3 executing on thread 12
[ Info: mainspin executing on thread 1
sleeper 3: 10.015027 seconds (113.49 M allocations: 6.764 GiB, 3.67% gc time)
worker 3: 10.025508 seconds (113.62 M allocations: 6.772 GiB, 3.67% gc time, 0.17% compilation time)