alan-turing-institute / bridge-data-platform

Repository that manages the Kubernetes JupyterHub deployment that hosts the 3D bridge data platform
MIT License
1 stars 1 forks source link

Creating a shared data folder for all users #10

Open sgibson91 opened 4 years ago

sgibson91 commented 4 years ago

Summary

The goal of this platform is to allow all (authenticated) users access to the 3D bridge data. JupyterHub can provide volume mounts for data storage, but can we create a shared one across all users? (Docs: https://zero-to-jupyterhub-k8s.readthedocs.io/user-storage.html)

TODO

sgibson91 commented 4 years ago

Here's where I'm asking for advice from the JupyterHub team, including this topic: https://discourse.jupyter.org/t/hosting-jupyterhubs-any-tips-for-new-admins/3433

sgibson91 commented 4 years ago

https://docs.microsoft.com/en-us/azure/aks/azure-nfs-volume

sgibson91 commented 4 years ago

Use this config to mount the PVC created by following the above Azure docs into all user pods: https://zero-to-jupyterhub.readthedocs.io/en/latest/customizing/user-storage.html#additional-storage-volumes

sgibson91 commented 4 years ago

Mounting an NFS Server seems to be proving difficult. It also requires a lot of compute power, needing a whole VM to host the NFS server.

Now trying with the following instructions: https://docs.microsoft.com/en-us/azure/aks/azure-files-volume

sgibson91 commented 4 years ago

Seems Kubernetes has problems mounting volumes on Azure Virtual Machine Scale Sets: https://github.com/kubernetes/kubernetes/issues/69262 I'm upgrading the cluster to version 1.15.5 to see if this resolves the problem.

sgibson91 commented 4 years ago

Closed by #20

sgibson91 commented 4 years ago

Reopening following discussion with @martintoreilly in #12 around securing access further.

https://github.com/alan-turing-institute/bridge-data-platform/issues/12#issuecomment-596661051

sgibson91 commented 4 years ago

First step is to place storage inside the VNET

https://docs.microsoft.com/en-gb/azure/storage/common/storage-network-security

martintoreilly commented 4 years ago

Note that placing storage inside the VNET only prevents inbound connections to the storage from outside the VNET. It currently also allows the compute within the VNET to access other arbitrary storage accounts in the same region. See https://github.com/alan-turing-institute/data-safe-haven/issues/381.

sgibson91 commented 4 years ago

I think I some steps in the below need admin level permissions to execute, e.g. setting up a domain service in the AD. https://docs.microsoft.com/en-gb/azure/storage/files/storage-files-identity-auth-active-directory-domain-service-enable

sgibson91 commented 4 years ago

Would placing the storage account in the VNET and replacing the storage account key with a SAS token in https://docs.microsoft.com/en-us/azure/aks/azure-files-volume be enough? (<-- thoughts from meeting with @fedenanni and @thobson88)

sgibson91 commented 4 years ago

Demystifying SAS tokens: https://www.c-sharpcorner.com/article/demystifying-sas-token-basics/

sgibson91 commented 4 years ago

If I'm understanding this correctly, using the azurefile storage class type in Kubernetes automatically uses SMB protocol.

sgibson91 commented 4 years ago

Private endpoints for Azure storage: https://docs.microsoft.com/en-us/azure/storage/common/storage-private-endpoints

But how to get Kubernetes to use it?