JuliaParallel / ClusterManagers.jl

Other
235 stars 74 forks source link

addprocs_qrsh() fails on cluster that supports qrsh #141

Open jtrakk opened 3 years ago

jtrakk commented 3 years ago

When I use addprocs_qrsh() I get an error message and no jobs are created (checked in qstat).

ClusterManagers.addprocs_qrsh(3,res_list="h_rt=2:00:00,h_data=4G,highp")
Error launching workers
MethodError(iterate, (Process(`qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE`, ProcessRunning),), 0x0000000000006caf)
Int64[]

My cluster does support qrsh. When I try to run the qrsh command manually in a shell, it produces these messages about host key, but does seem to allocate the worker, as I can see it in qstat.

qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE
could not open any host key
ssh_keysign: no reply
key_sign failed
julia_worker:9934

job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID 
------------------------------------------------------------------------------------------------------------------------------------------------
   4514401 0.50500 QRLOGIN    user         r     09/02/2020 00:05:14 my.q@nodexxx                                                  2        

When I use addprocs_sge() it works just fine.


This looks like the same issue as this comment but opened a new issue as that one was originally opened for a different purpose.

Julia 1.5.1 ClusterManagers.jl master branch dde400e953cd8cf631802866e164697019805a92

oameye commented 1 year ago

I encounter the same issue