2i2c-org / features

Temporary location for feature requests sent to 2i2c
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add SFTP service to basehub #20

Open consideRatio opened 1 year ago

consideRatio commented 1 year ago

This service, which ships with the yuvipanda/jupyterhub-ssh helm chart, is used to bring data in and out of home directories and works without involvement from a user server by having an SFTP server mount the user storage's directly.

The jupyterhub-ssh chart also provides another kind of service - to start or access already started user servers via ssh. Setup of that is not part of this issue - only the sftp server.

Action point

Technical notes

This ought to be an opt-in feature not enabled by default initially.

Below are snippets of config from configuring jupyterhub-ssh (both ssh part and sftp part) for hub.jupytearth.org also referred to as the JMTE project, as seen in changes in 2i2c-org/infrastructure#436. There, jupyterhub-ssh was added as a dependency to the daskhub helm chart.

From config/clusters/jmte/common.values.yaml

basehub:
  jupyterhub:
    proxy:
      service:
        # jupyterhub-ssh/sftp integration part 1/3:
        #
        # We must accept traffic to the k8s Service (proxy-public) receiving traffic
        # from the internet. Port 22 is typically used for both SSH and SFTP, but we
        # can't use the same port for both so we use 2222 for SFTP in this example.
        #
        extraPorts:
          - name: ssh
            port: 22
            targetPort: ssh
          - name: sftp
            port: 2222
            targetPort: sftp
      traefik:
        # jupyterhub-ssh/sftp integration part 2/3:
        #
        # We must accept traffic arriving to the autohttps pod (traefik) from the
        # proxy-public service. Expose a port and update the NetworkPolicy
        # to tolerate incoming (ingress) traffic on the exposed port.
        #
        extraPorts:
          - name: ssh
            containerPort: 8022
          - name: sftp
            containerPort: 2222
        networkPolicy:
          allowedIngressPorts: [http, https, ssh, sftp]
        # jupyterhub-ssh/sftp integration part 3/3:
        #
        # extraStaticConfig is adjusted by staging/prod values
        # extraDynamicConfig is adjusted by staging/prod values

# jupyterhub-ssh values.yaml reference:
# https://github.com/yuvipanda/jupyterhub-ssh/blob/main/helm-chart/jupyterhub-ssh/values.yaml
#
jupyterhub-ssh:
  hubUrl: http://proxy-http:8000

  ssh:
    enabled: true

  sftp:
    # enabled is adjusted by staging/prod values
    # enabled: true
    pvc:
      enabled: true
      name: home-nfs

From config/clusters/jmte/prod.values.yaml:

basehub:
  jupyterhub:
    proxy:
      traefik:
        # jupyterhub-ssh/sftp integration part 3/3:
        #
        # We must let traefik know it should listen for traffic (traefik entrypoint)
        # and route it (traefik router) onwards to the jupyterhub-ssh k8s Service
        # (traefik service).
        #
        extraStaticConfig:
          entryPoints:
            ssh-entrypoint:
              address: :8022
            sftp-entrypoint:
              address: :2222
        extraDynamicConfig:
          tcp:
            services:
              ssh-service:
                loadBalancer:
                  servers:
                    - address: jupyterhub-ssh:22
              sftp-service:
                loadBalancer:
                  servers:
                    - address: jupyterhub-sftp:22
            routers:
              ssh-router:
                entrypoints: [ssh-entrypoint]
                rule: HostSNI(`*`)
                service: ssh-service
              sftp-router:
                entrypoints: [sftp-entrypoint]
                rule: HostSNI(`*`)
                service: sftp-service

jupyterhub-ssh:
  sftp:
    enabled: true

From helm-charts/daskhub/Chart.yaml:

  - name: jupyterhub-ssh
    version: 0.0.1-n142.h402a3d6
    repository: https://yuvipanda.github.io/jupyterhub-ssh/

From helm-charts/daskhub/values.schema.yaml:

  # jupyterhub-ssh is a dependent helm chart, we rely on its schema validation
  # for values passed to it and are not imposing restrictions on them in this
  # helm chart.
  jupyterhub-ssh:
    type: object
    additionalProperties: true
yuvipanda commented 1 year ago

I think there's a lot of demand for sftp in particular, so I think we should definitely do this! I also would say we should turn on sftp by default (but not ssh), as it just uses openssh and is fairly secure.

consideRatio commented 1 year ago

@yuvipanda I think its reasonable to expose by default long term, but that we for the sake of stability across hubs let it be piloted a while in a few hubs until we have ensured that the dependency is sufficiently mature.

I'm thinking for example if this service would be enbled in a hub where security is critical, and we end up using a quite old build of a docker image that may have outdated dependencies with known vulnerabilities, for example in OpenSSH.

yuvipanda commented 1 year ago

Yep makes sense! Let's not turn it on by default to start with.

tsnow03 commented 3 months ago

Hi @yuvipanda and @consideRatio. Bumping SSH capabilities in the cloud. We have been getting a number of use cases for SSH capabilities in CryoCloud. Here are some of the use cases laid out to include the one mentioned in CryoInTheCloud/hub-image/issues/54: