Open consideRatio opened 1 year ago
I think there's a lot of demand for sftp in particular, so I think we should definitely do this! I also would say we should turn on sftp by default (but not ssh), as it just uses openssh and is fairly secure.
@yuvipanda I think its reasonable to expose by default long term, but that we for the sake of stability across hubs let it be piloted a while in a few hubs until we have ensured that the dependency is sufficiently mature.
I'm thinking for example if this service would be enbled in a hub where security is critical, and we end up using a quite old build of a docker image that may have outdated dependencies with known vulnerabilities, for example in OpenSSH.
Yep makes sense! Let's not turn it on by default to start with.
Hi @yuvipanda and @consideRatio. Bumping SSH capabilities in the cloud. We have been getting a number of use cases for SSH capabilities in CryoCloud. Here are some of the use cases laid out to include the one mentioned in CryoInTheCloud/hub-image/issues/54:
A student running simulations in the Ice Sheet System Model (ISSM), which runs using SLURM and a super computer. The workflow becomes cumbersome if only one person (the advisor) has access to an HPC (in this case one needing NASA credentials). Small changes regularly need to be made and tested to configure a model run. Right now if the student wants to test a new configuration, they need to send the files and data to their advisor (or share on CryoCloud), the advisor needs to pull the files out of CryoCloud, send it through the NASA HPC, and send the output back to the student to put back into cryocloud for post-processing. With SSH capabilities in CryoCloud, the student could easily do the pre- and post-processing in CryoCloud, sharing the workflow and having the advisor test or run the new configurations with no transfer of data or files required. This sort of workflow is critical for any of the modelers in our group. In particular we are hoping to use this capability for an upcoming ISMIP7 (Ice Sheet Model Intercomparison Study) for the next IPPC reports which will be starting to ramp up in the next 3-6 months. They are users that have access to the GHub Buffallo and NASA HPCs.
There are similar use cases for other glacier modeling groups, compute-intensive geostatistics (UF NVIDIA), and massive remote sensing processing projects (ITS_LIVE) that need HPC compute regularly but that you want to do the pre- and post-processing and data streaming in the cloud. NASA has solved this problem for some of their NASA internal users by creating a cloud-HPC like setup that uses SLURM, but it is very costly and seems like a waste of resources when there is an HPC sitting at NASA that could be used for a similar use. It would be extremely cost-saving to our community to have SSH capabilities.
As mentioned above, SSH or FTP capability in the terminal command line within CryoCloud would enable users to upload their own data remotely, streamlining data access and transfer that is currently a little more clunky.
This service, which ships with the yuvipanda/jupyterhub-ssh helm chart, is used to bring data in and out of home directories and works without involvement from a user server by having an SFTP server mount the user storage's directly.
The
jupyterhub-ssh
chart also provides another kind of service - to start or access already started user servers viassh
. Setup of that is not part of this issue - only thesftp
server.Action point
sftp
to help us not make mistakes.Technical notes
This ought to be an opt-in feature not enabled by default initially.
Below are snippets of config from configuring jupyterhub-ssh (both ssh part and sftp part) for hub.jupytearth.org also referred to as the JMTE project, as seen in changes in 2i2c-org/infrastructure#436. There, jupyterhub-ssh was added as a dependency to the daskhub helm chart.
From config/clusters/jmte/common.values.yaml
From
config/clusters/jmte/prod.values.yaml
:From helm-charts/daskhub/Chart.yaml:
From helm-charts/daskhub/values.schema.yaml: