DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Unable to connect via ssh despite --ssh flag #233

Open sansarsecondary opened 2 years ago

sansarsecondary commented 2 years ago

All my job submissions have the --ssh flag enabled. I am unable to SSH into the dsub cloud worker machine via either web GUI or other gcloud supported means.

Task log files contain this: 2022/01/04 10:18:43 Failed to handle connection: handshake: ssh: disconnect, reason 11: Bye Bye

dsub ... --provider google-cls-v2 --ssh

Any thoughts why this may be occuring? I have tried having public IP but that does not make a difference. Job executes just fine.

Non dsub launched instances in the project exhibit no trouble connecting via ssh.

wnojopra commented 2 years ago

Hi @sansarsecondary! I haven't seen that error in dsub before. Could you help me debug with the following:

1) Could you please run dstat with the --full option on a job where you could not ssh into the machine, and look for any messages related to SSH? I'm looking to see if there are any errors or warnings popping up here. Notable places to look would be in the events field and the status-detail. 2) Could you describe with a bit more detail on what happens when you try to SSH via the web GUI? What errors show up? 3) How long after job start does the Failed to handle connection error message show up in the log? 4) You mention that all of your job submissions have the --ssh flag enabled. Around how many jobs is this? Admittedly I typically test --ssh with no more than a few jobs at once.

rivershah commented 2 years ago

I found the reason for the error. Firewall rules for the project were corrupted and ssh traffic was getting blocked. In case another user runs into same issue, please ensure ssh ingress traffic enabled