julia> addprocs_slurm(100);
srun: job 1218546 queued and waiting for resources
Error launching Slurm job:
ERROR: UndefVarError: warn not defined
Stacktrace:
[1] wait(::Task) at ./task.jl:191
[2] #addprocs_locked#44(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::SlurmManager) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:418
[3] addprocs_locked at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:372 [inlined]
[4] #addprocs#43(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::SlurmManager) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:365
[5] #addprocs_slurm#15 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:359 [inlined]
[6] addprocs_slurm(::Int64) at /home/jb6888/.julia/packages/ClusterManagers/7pPEP/src/slurm.jl:85
[7] top-level scope at none:0
The issue seems to be with @async_launch in cluster.jl. However, even after the error, the job is left pending on the queue and might be allocated resources later.
squeue -u jb6888
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1218546 par_std julia-14 jb6888 PD 0:00 4 (Priority)
Shouldn't an error launching jobs remove it from the queue as well? Or is it still there because the warn error prevents subsequent clean-up from taking place?
Cleanup is normally performed when a process shuts down on the Compute node, so you are right we could and should do a better job with error handling here.
I am encountering this error if jobs time out
The issue seems to be with @async_launch in cluster.jl. However, even after the error, the job is left pending on the queue and might be allocated resources later.
Shouldn't an error launching jobs remove it from the queue as well? Or is it still there because the warn error prevents subsequent clean-up from taking place?