JuliaParallel / ClusterManagers.jl

Other
232 stars 74 forks source link

LSF manager broken in Julia 1.8.1 #182

Closed DrChainsaw closed 1 year ago

DrChainsaw commented 1 year ago
julia> addprocs_lsf(3)
Error launching workers
CapturedException(ClusterManagers.LSFException(""), Any[(lsf_bpeek(manager::LSFManager, jobid::SubString{String}, iarray::Int64) at lsf.jl:61, 1), (lsf_launch_and_monitor(manager::LSFManager, launched::Vector{WorkerConfig}, c::Condition, jobid::SubString{String}, iarray::Int64) at lsf.jl:71, 1), (#31 at lsf.jl:98 [inlined], 1), ((::Base.var"#929#934"{ClusterManagers.var"#31#32"{LSFManager, Vector{WorkerConfig}, Condition, SubString{String}}})(r::Base.RefValue{Any}, args::Tuple{Int64}) at asyncmap.jl:100, 1), (macro expansion at asyncmap.jl:234 [inlined], 1), ((::Base.var"#945#946"{Base.var"#929#934"{ClusterManagers.var"#31#32"{LSFManager, Vector{WorkerConfig}, Condition, SubString{String}}}, Channel{Any}, Nothing})() at task.jl:484, 1)])
Int64[]

Seems the culprit is that the readline here returns an empty string if there is no new line in the buffer. It did not do this in 1.6.2 (haven't tried versions in between) where it instead blocked until new data was available in the buffer.

Easy fix is to treat empty line the same as "job not started", but then we'll be busylooping when we should be sleeping.

Maybe this is an issue with julia 1.8.1?

Here is a reproducer:

julia> io = Base.BufferStream()
BufferStream(bytes waiting=0, isopen=true)

julia> @async while true
       @show readline(io)
       end

pipeline(`echo hi`, stdout=io) |> run
readline(io) = "hi"
readline(io) = ""
readline(io) = ""
readline(io) = ""
readline(io) = ""
readline(io) = ""
readline(io) = ""
readline(io) = ""
etc..
DrChainsaw commented 1 year ago

Link to Julia issue: https://github.com/JuliaLang/julia/issues/46869

bjarthur commented 1 year ago

we'll need to fix this in a way that works on 1.6 and 1.8 since 1.6 is the long-term support version