DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
262 stars 44 forks source link

Security concerns with using the '--ssh' option #238

Open slagelwa opened 2 years ago

slagelwa commented 2 years ago

I'm following up on an item reported by a colleague about and issue using dsub reporting an ssh error:

- name: 'Started running "ssh": listening on xx.xx.xxx:22 (22)'
start-time: yyyy.429792+00:00
- name: Unexpected exit status 128 while running "ssh"
start-time:yyyy.749872+00:00
- name: 'Execution failed: generic::failed_precondition: while running "ssh": unexpected
exit status 128 was not ignored'

I'm looking at the dsub code and see that the image it appears to use is gcr.io/cloud-genomics-pipelines/tools. It looks like that image was last updated in February of 2019. I pulled the docker image myself and ran an open source vulnerability scanning tool, trivy, on it:

2022-04-11T19:46:06.444Z INFO Detected OS: debian
2022-04-11T19:46:06.444Z INFO Detecting Debian vulnerabilities...
2022-04-11T19:46:06.489Z INFO Number of language-specific files: 0
gcr.io/cloud-genomics-pipelines/tools:latest (debian 9.7)
=========================================================
Total: 1557 (UNKNOWN: 12, LOW: 519, MEDIUM: 471, HIGH: 451, CRITICAL: 104)

A fair number of the critical CVE's are related to ssh. Am I right in understanding that this is the image that is used to provide ssh services to the VM when running an dsub job? And if so isn't there a concern that users might not have their (default or other) network setup properly and they could be giving these VM's external IP addresses?

And I almost wonder if this might be related to #233

wnojopra commented 2 years ago

Thank you for reporting this @slagelwa ! Appreciate you for digging into it.

Am I right in understanding that this is the image that is used to provide ssh services to the VM when running an dsub job?

This is correct. dsub just picks up the SSH image provided by the Lifesciences team to run an SSH server in that container. The intent, just as with the pipelines tool is make a bit of inspection possible, along with being able to inspect logs in real-time.

I've contacted the Lifesciences team and will update here accordingly.

slagelwa commented 2 years ago

Curious if there's been any feedback on this issue from the Lifesciences team?

wnojopra commented 2 years ago

Not yet. I have pinged again for an update.

The repo for pipelines tools is here. It may be worthwhile to file an issue there, though as you mentioned it hasn't been updated in a few years.

wnojopra commented 2 years ago

Hi @slagelwa,

The team that supports the Lifesciences API responded and agreed that the image needs updating. I've also filed https://github.com/googlegenomics/pipelines-tools/issues/108 to track in that repo.