Closed wakonp closed 7 years ago
Hello @wakonp, can you tell me something more about the need for this PR?
I am currently working on a project for my master thesis, where I have to create a practicing environment for students at my university. Therefore, I started using Jupyterhub with your SwarmSpawner, which is really awesome by the way.
To achieve persistent data storage in every spawned Jupyter Notebook I used Docker Volumes. The problem with these volumes is, that they are only available on a specific node. So if a student logs in a again and the SwarmSpawner creates the Notebook Server on a different node, his/hers volume (data) is not available there.
So I created a NFS-Server Container on my main node, which is available from every other node in the docker swarm. Now every time a new Notebookserver is created (user logs in) the SwarmSpawner needs to create a volume with Driver_Config option. There you can define, which driver docker should use to create the volume (in my case 'type' = 'nfs4'). So every time a new Docker Service is created, a new Volume, which links to my NFS Server, is created. Now User always write their data on the NFS-Server Share and not in the Docker Volumes.
I am currently working at my repo. I also described this here
I used this Documentation, where you can see that the Mount class can be instantiated with a parameter driver_config, which need to be type of DriverConfig.
The only thing I added to your code is, that the Config-String driver_config
gets instantiated and added in the mount instantiation.
.env File
NFSSERVER_IP=10.15.202.10
NFSSERVER_USERDATA_DEVICE=/jupyterUsers/{username}
NFSSERVER_ASSIGNMENTDATA_DEVICE=/jupyterAssignments
This is my jupyterhub_config.py
.
mounts = [{'type' : 'volume',
'target' : os.environ.get('SWARMSPAWNER_NOTEBOOK_DIR'),
'source' : 'jupyterhub-user-{username}',
'no_copy' : True,
'driver_config' : {
'name' : 'local',
'options' : {
'type' : 'nfs4',
'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
}
}},{
'type' : 'volume',
'target' : '/srv/nbgrader/exchange',
'source' : 'jupyter-exchange-volume',
'no_copy' : True,
'driver_config' : {
'name' : 'local',
'options' : {
'type' : 'nfs4',
'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
'device' : ':'+os.environ.get('NFSSERVER_ASSIGNMENTDATA_DEVICE')
}
}}]
c.SwarmSpawner.container_spec = {
'args' : ['start-singleuser.sh'],
'Image' : os.environ.get('SWARMSPAWNER_NOTEBOOK_IMAGE'),
'mounts' : mounts
}
You can see that the mounts array consists of two volume configurations. In your code the driver_config
property will produce an error, because it is of type string.
swarmspawner.py
container_spec['mounts'] = []
for mount in self.container_spec['mounts']:
m = dict(**mount)
if 'source' in m:
m['source'] = m['source'].format(username=self.user.name)
if 'driver_config' in m:
m['driver_config']['options']['device'] = m['driver_config']['options']['device'].format(username=self.user.name)
m['driver_config'] = docker.types.DriverConfig(**m['driver_config'])
container_spec['mounts'].append(docker.types.Mount(**m))
If the mount configuration contains a driver_config
string, the string gets instantiated m['driver_config'] = docker.types.DriverConfig(**m['driver_config'])
. Also the device
property gets checked for the username placeholder (same with source
property) so it is possible to use the username for the subfolder structure of your NFS Folder.
great, thanks!
Are you using a docker volume plugin?
I see you are struggling with the UID/GID for the nfs. One solution (I used this before leaving NFS) was to have joyvan also as a owner of the nfs server, in this way UID and GID remain the same.
Another think is: do you use some auth with NFS server?
I did not install any third party plugins for docker.
I just use the local driver driver: local
.
Well I do not struggle anymore 😄 . I created a working solution and I am currently writing it down in my wiki. (Which is obviously not finished yet)
When a user logs in at the Hub the Authenticator checks if he is authorized for the service. If this is the case the authenticate
function is called in any Jupyterhub Authenticator. So if everything works fine, this function returns the username
to the Hub, which is instantiating the Spawner and providing it with the username
.
I did following: Before the username is returned by the Authenticator I call:
docker exec -it NFS_CONTAINER useradd username
docker exec -it NFS_CONTAINER mkdir /USERHOMESHARE/username && chown username /USERHOMESHARE/username
So the NFS_CONTAINER creates a new user for the authenticated username and provides a new folder owned by the new user.
One of the two mounts I mentioned in my previous comment points directly to the USERHOMESHARE/username folder of my NFS_CONTAINER. So the user will always get his/her shared NFS-Homefolder.
Furthermore, before the notebook server is started via the arg
in the container_spec
, which is start-singleuser.sh
I call:
NB_UID=docker exec jupyterhub_nfs id -u username
NB_GID=docker exec jupyterhub_nfs id -g username
This adds NB_UID
and NB_GID
to the ENV of the 'not yet spawned' notebook. Now the notebook only needs to get started with the root user.
The start.sh
script checks if the Notebook-Server is spawned as root and takes the ENV variables UID
and GID
and alters the $NB_USER
(jovyan). At the end root starts the notebook as $NB_USER
which has the new NB_UID
and NB_GID
and can write in his/her NFS share.
I do not use any auth with NFS because I mount only specific folders for each user. So userA will only have access to /USERHOMESHARE/userA
. No user is able to access /USERHOMESHARE/ from within their notebooks. That's because you can not run a service in privileged mode and therefore can not mount anything while the service is running. I wrote that down here under notes
The second mount is needed for nbgrader
but that's a different story 😄
Nice solution!
The security is limited to UID/GID?
That's right! I manage all rights/permissions in the NFS Container.
Another feature of my project is, that the SwarmSpawner can check if the username is a teacher or a student. The teacher can create assignments in nbgrader, wheras the student can attempt assignments (starts two different docker images, teacher- order studentennotebook).
In addition the NFS User will get either the group 'students' or the group 'teachers'. So I can manage the rights for nbgrader submittions.
But that feature is very specific and I dont really think that this should be part of the SwarmSpawner Repo. Maybe only as a extension repo something like SwarmSpawnerNbGrader.
Cool solution @wakonp!! I have learned so many new things just by reading your explanation here as well as the 'half-baked' wiki. I am especially looking forward to try configuring my Jupyterhub with the NB_UID and NB_GID to change the user jovyan on the spawned NB. Good job! I will keep monitoring the development here and in your repo.
By the way I created an Issue at the jupyter/docker-stacks which resulted in this pull request 435. Now the UID and GID won't get changed while starting the notebook server via start.sh
, but the docker run command will be used like this:
docker run -u 501 -g 501 --group-add user-writable -it jupyter/base-notebook
So if you want to use the SwarmSpawner you have to add the user
property in the container_specs as shown here. The docker run command with the user option will be described here.
So I think the final jupyterhub_config.py would look like this:
c.SwarmSpawner.container_spec = {
'args' : ['start-singleuser.sh'],
'Image' : os.environ.get('SWARMSPAWNER_NOTEBOOK_IMAGE'),
'mounts' : mounts,
'user' : os.environ.get(NB_UID')+':'+os.environ.get('NB_GID')
}
and the mounts configuration is still the same? I actually have a problem understanding this part of configuration:
'options' : {
'type' : 'nfs4',
'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
}
Where did you get the information about this 'options'? I could not find it in the documentation.
HI @hans-permana ,you are passing this info to the spawner.
NFSSERVER_IP
is your nfs server (you need to have one)
NFSSERVER_USERDATA_DEVICE
is the path inside the nfs server, like /myshares/myuser
Hi @barrachri, thanks for the info. However, my question is more like: who will use the DriverConfig object? Where does the addr key come from? Is it to be used by a wrapper around mount command in unix. If so, I could not find this in the manual page.
DriverConfig
is a Docker option :)
Who will use it? The spawner while starting the container, it give information to Docker about how to start the service.
Hi @hans-permana, The mount configuration in the mount array as displayed beneath will be used to create new Docker Volumes before the Service gets spawned.
mounts = [{'type' : 'volume',
'target' : os.environ.get('SWARMSPAWNER_NOTEBOOK_DIR'),
'source' : 'jupyterhub-user-{username}',
'no_copy' : True,
'driver_config' : {
'name' : 'local',
'options' : {
'type' : 'nfs4',
'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
}
}},{
'type' : 'volume',
'target' : '/srv/nbgrader/exchange',
'source' : 'jupyter-exchange-volume',
'no_copy' : True,
'driver_config' : {
'name' : 'local',
'options' : {
'type' : 'nfs4',
'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
'device' : ':'+os.environ.get('NFSSERVER_ASSIGNMENTDATA_DEVICE')
}
}}]
So the the objects in the array gets instantiated with this class.
You can do it in the command line like this:
docker volume create --driver local \ --opt type=nfs \ --opt o=addr=192.168.1.1,rw \ --opt device=:/path/to/dir \ foo https://docs.docker.com/engine/reference/commandline/volume_create/#extended-description
The --opt
attributes are the driver_config
for docker.
Thanks @wakonp, that's exactly what I was looking for :)
@barrachri, is this PR going to be merged? I think it is a very useful feature and I can imagine it will be useful for many.
Yes, definitively. If you are not in rush I plan to do this during the weekend. I plan to add more info inside the readme and update the PyPI package
not at all :) Thanks guys for the help 👍
I needed to extent your repo for the use of nfs volumes. Therefore, I added these three lines, where the config string gets instantiated. I think this could be merged really easy :)