Open sansarsecondary opened 2 years ago
Hi @sansarsecondary! I haven't seen that error in dsub before. Could you help me debug with the following:
1) Could you please run dstat
with the --full
option on a job where you could not ssh into the machine, and look for any messages related to SSH? I'm looking to see if there are any errors or warnings popping up here. Notable places to look would be in the events
field and the status-detail
.
2) Could you describe with a bit more detail on what happens when you try to SSH via the web GUI? What errors show up?
3) How long after job start does the Failed to handle connection
error message show up in the log?
4) You mention that all of your job submissions have the --ssh
flag enabled. Around how many jobs is this? Admittedly I typically test --ssh
with no more than a few jobs at once.
All my job submissions have the
--ssh
flag enabled. I am unable to SSH into the dsub cloud worker machine via either web GUI or othergcloud
supported means.Task log files contain this:
2022/01/04 10:18:43 Failed to handle connection: handshake: ssh: disconnect, reason 11: Bye Bye
dsub ... --provider google-cls-v2 --ssh
Any thoughts why this may be occuring? I have tried having public IP but that does not make a difference. Job executes just fine.
Non
dsub
launched instances in the project exhibit no trouble connecting via ssh.