cassinyio / SwarmSpawner

This repo is deprecated. A spawner for JupyterHub
BSD 3-Clause "New" or "Revised" License
23 stars 36 forks source link

Added driver_config support #15

Closed wakonp closed 7 years ago

wakonp commented 7 years ago

I needed to extent your repo for the use of nfs volumes. Therefore, I added these three lines, where the config string gets instantiated. I think this could be merged really easy :)

barrachri commented 7 years ago

Hello @wakonp, can you tell me something more about the need for this PR?

wakonp commented 7 years ago

I am currently working on a project for my master thesis, where I have to create a practicing environment for students at my university. Therefore, I started using Jupyterhub with your SwarmSpawner, which is really awesome by the way.

To achieve persistent data storage in every spawned Jupyter Notebook I used Docker Volumes. The problem with these volumes is, that they are only available on a specific node. So if a student logs in a again and the SwarmSpawner creates the Notebook Server on a different node, his/hers volume (data) is not available there.

So I created a NFS-Server Container on my main node, which is available from every other node in the docker swarm. Now every time a new Notebookserver is created (user logs in) the SwarmSpawner needs to create a volume with Driver_Config option. There you can define, which driver docker should use to create the volume (in my case 'type' = 'nfs4'). So every time a new Docker Service is created, a new Volume, which links to my NFS Server, is created. Now User always write their data on the NFS-Server Share and not in the Docker Volumes.

I am currently working at my repo. I also described this here

I used this Documentation, where you can see that the Mount class can be instantiated with a parameter driver_config, which need to be type of DriverConfig.

The only thing I added to your code is, that the Config-String driver_config gets instantiated and added in the mount instantiation.

.env File

NFSSERVER_IP=10.15.202.10
NFSSERVER_USERDATA_DEVICE=/jupyterUsers/{username}
NFSSERVER_ASSIGNMENTDATA_DEVICE=/jupyterAssignments

This is my jupyterhub_config.py.

mounts = [{'type' : 'volume',
'target' : os.environ.get('SWARMSPAWNER_NOTEBOOK_DIR'),
'source' : 'jupyterhub-user-{username}',
'no_copy' : True,
'driver_config' : {
  'name' : 'local',
  'options' : {
     'type' : 'nfs4',
     'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
     'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
   }
}},{
'type' : 'volume',
'target' : '/srv/nbgrader/exchange',
'source' : 'jupyter-exchange-volume',
'no_copy' : True,
'driver_config' : {
  'name' : 'local',
  'options' : {
     'type' : 'nfs4',
     'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
     'device' : ':'+os.environ.get('NFSSERVER_ASSIGNMENTDATA_DEVICE')
   }
}}]
c.SwarmSpawner.container_spec = {
            'args' : ['start-singleuser.sh'],
            'Image' : os.environ.get('SWARMSPAWNER_NOTEBOOK_IMAGE'),
            'mounts' : mounts
          }

You can see that the mounts array consists of two volume configurations. In your code the driver_config property will produce an error, because it is of type string.

swarmspawner.py

container_spec['mounts'] = []
            for mount in self.container_spec['mounts']:
                m = dict(**mount)
                if 'source' in m:
                    m['source'] = m['source'].format(username=self.user.name)
                if 'driver_config' in m:
                    m['driver_config']['options']['device'] = m['driver_config']['options']['device'].format(username=self.user.name)
                    m['driver_config'] = docker.types.DriverConfig(**m['driver_config'])
                container_spec['mounts'].append(docker.types.Mount(**m))

If the mount configuration contains a driver_config string, the string gets instantiated m['driver_config'] = docker.types.DriverConfig(**m['driver_config']). Also the device property gets checked for the username placeholder (same with source property) so it is possible to use the username for the subfolder structure of your NFS Folder.

barrachri commented 7 years ago

great, thanks!

Are you using a docker volume plugin?

I see you are struggling with the UID/GID for the nfs. One solution (I used this before leaving NFS) was to have joyvan also as a owner of the nfs server, in this way UID and GID remain the same.

Another think is: do you use some auth with NFS server?

wakonp commented 7 years ago

I did not install any third party plugins for docker.

I just use the local driver driver: local.

Well I do not struggle anymore 😄 . I created a working solution and I am currently writing it down in my wiki. (Which is obviously not finished yet)

When a user logs in at the Hub the Authenticator checks if he is authorized for the service. If this is the case the authenticate function is called in any Jupyterhub Authenticator. So if everything works fine, this function returns the username to the Hub, which is instantiating the Spawner and providing it with the username.

I did following: Before the username is returned by the Authenticator I call:

docker exec -it NFS_CONTAINER useradd username docker exec -it NFS_CONTAINER mkdir /USERHOMESHARE/username && chown username /USERHOMESHARE/username

So the NFS_CONTAINER creates a new user for the authenticated username and provides a new folder owned by the new user.

One of the two mounts I mentioned in my previous comment points directly to the USERHOMESHARE/username folder of my NFS_CONTAINER. So the user will always get his/her shared NFS-Homefolder.

Furthermore, before the notebook server is started via the arg in the container_spec, which is start-singleuser.sh I call:

NB_UID=docker exec jupyterhub_nfs id -u username NB_GID=docker exec jupyterhub_nfs id -g username

This adds NB_UID and NB_GID to the ENV of the 'not yet spawned' notebook. Now the notebook only needs to get started with the root user.

The start.sh script checks if the Notebook-Server is spawned as root and takes the ENV variables UID and GID and alters the $NB_USER (jovyan). At the end root starts the notebook as $NB_USER which has the new NB_UID and NB_GID and can write in his/her NFS share.

I do not use any auth with NFS because I mount only specific folders for each user. So userA will only have access to /USERHOMESHARE/userA. No user is able to access /USERHOMESHARE/ from within their notebooks. That's because you can not run a service in privileged mode and therefore can not mount anything while the service is running. I wrote that down here under notes

The second mount is needed for nbgrader but that's a different story 😄

barrachri commented 7 years ago

Nice solution!

The security is limited to UID/GID?

wakonp commented 7 years ago

That's right! I manage all rights/permissions in the NFS Container.

Another feature of my project is, that the SwarmSpawner can check if the username is a teacher or a student. The teacher can create assignments in nbgrader, wheras the student can attempt assignments (starts two different docker images, teacher- order studentennotebook).

In addition the NFS User will get either the group 'students' or the group 'teachers'. So I can manage the rights for nbgrader submittions.

But that feature is very specific and I dont really think that this should be part of the SwarmSpawner Repo. Maybe only as a extension repo something like SwarmSpawnerNbGrader.

hans-permana commented 7 years ago

Cool solution @wakonp!! I have learned so many new things just by reading your explanation here as well as the 'half-baked' wiki. I am especially looking forward to try configuring my Jupyterhub with the NB_UID and NB_GID to change the user jovyan on the spawned NB. Good job! I will keep monitoring the development here and in your repo.

wakonp commented 7 years ago

By the way I created an Issue at the jupyter/docker-stacks which resulted in this pull request 435. Now the UID and GID won't get changed while starting the notebook server via start.sh, but the docker run command will be used like this:

docker run -u 501 -g 501 --group-add user-writable -it jupyter/base-notebook

So if you want to use the SwarmSpawner you have to add the user property in the container_specs as shown here. The docker run command with the user option will be described here.

So I think the final jupyterhub_config.py would look like this:

c.SwarmSpawner.container_spec = {
            'args' : ['start-singleuser.sh'],
                        'Image' : os.environ.get('SWARMSPAWNER_NOTEBOOK_IMAGE'),
            'mounts' : mounts,
                        'user' : os.environ.get(NB_UID')+':'+os.environ.get('NB_GID')
          }
hans-permana commented 7 years ago

and the mounts configuration is still the same? I actually have a problem understanding this part of configuration:

'options' : {
     'type' : 'nfs4',
     'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
     'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
   }

Where did you get the information about this 'options'? I could not find it in the documentation.

barrachri commented 7 years ago

HI @hans-permana ,you are passing this info to the spawner.

NFSSERVER_IP is your nfs server (you need to have one) NFSSERVER_USERDATA_DEVICE is the path inside the nfs server, like /myshares/myuser

hans-permana commented 7 years ago

Hi @barrachri, thanks for the info. However, my question is more like: who will use the DriverConfig object? Where does the addr key come from? Is it to be used by a wrapper around mount command in unix. If so, I could not find this in the manual page.

barrachri commented 7 years ago

DriverConfig is a Docker option :) Who will use it? The spawner while starting the container, it give information to Docker about how to start the service.

wakonp commented 7 years ago

Hi @hans-permana, The mount configuration in the mount array as displayed beneath will be used to create new Docker Volumes before the Service gets spawned.

mounts = [{'type' : 'volume',
'target' : os.environ.get('SWARMSPAWNER_NOTEBOOK_DIR'),
'source' : 'jupyterhub-user-{username}',
'no_copy' : True,
'driver_config' : {
  'name' : 'local',
  'options' : {
     'type' : 'nfs4',
     'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
     'device' : ':'+os.environ.get('NFSSERVER_USERDATA_DEVICE')
   }
}},{
'type' : 'volume',
'target' : '/srv/nbgrader/exchange',
'source' : 'jupyter-exchange-volume',
'no_copy' : True,
'driver_config' : {
  'name' : 'local',
  'options' : {
     'type' : 'nfs4',
     'o' : 'addr='+os.environ.get('NFSSERVER_IP')+',rw',
     'device' : ':'+os.environ.get('NFSSERVER_ASSIGNMENTDATA_DEVICE')
   }
}}]

So the the objects in the array gets instantiated with this class.

You can do it in the command line like this:

docker volume create --driver local \ --opt type=nfs \ --opt o=addr=192.168.1.1,rw \ --opt device=:/path/to/dir \ foo https://docs.docker.com/engine/reference/commandline/volume_create/#extended-description

The --opt attributes are the driver_config for docker.

hans-permana commented 7 years ago

Thanks @wakonp, that's exactly what I was looking for :)

@barrachri, is this PR going to be merged? I think it is a very useful feature and I can imagine it will be useful for many.

barrachri commented 7 years ago

Yes, definitively. If you are not in rush I plan to do this during the weekend. I plan to add more info inside the readme and update the PyPI package

hans-permana commented 7 years ago

not at all :) Thanks guys for the help 👍