bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

:bug: [BUG]: Cowbird is not backward compatible with existing Jupyter users #425

Open tlvu opened 6 months ago

tlvu commented 6 months ago

Summary

Activating Cowbird with existing Jupyter users have many road blocks. This is in contrast with the usual "just enable the new component in env.local and it should play nice with all existing components" message we are trying to convey in the stack.

A migration guide for system with existing Jupyter users would have been helpful.

Below are the various problems I faced so far and any work-around I was able to find. Will add more to this list as I try out Cowbird.

Details

To Reproduce

Steps to reproduce the behavior:

  1. Use birdhouse-deploy at any versions before 2.0.0
  2. Enable the poor man's public share in env.local by uncommenting this section https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/env.local.example#L377-L425
  3. Create a Jupyter user via Magpie
  4. Login to JupyterHub and create some data under writable-workspace
  5. Update birdhouse-deploy to any version after 2.0.0 where Cowbird is enabled by default
  6. Re-enable any components that is not enabled by default anymore in env.local, ex: ./components/jupyterhub

Environment

Information Value
Server/Platform URL My dev PAVICS stack
Version Tag/Commit 2.0.5
Related issues/PR #392
Related components Jupyter, Cowbird, possibly Weaver, Magpie
Custom configuration

Concerned Organizations

@fmigneault @ChaamC @Nazim-crim @mishaschwartz @eyvorchuk

fmigneault commented 6 months ago

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Otherwise this error in docker logs jupyterhub: [E 2024-01-16 15:30:36.478 JupyterHub user:884] Unhandled error starting lvu's > server: The user lvu's workspace doesn't exist in the workspace directory, but should have been created by Cowbird already.

This looks like the volume mounted as /data/user_workspaces could be owned by root or some other user that the internal jupyter spawner user cannot get sufficient permissions to create the user-specific workspace, or that /data/user_workspaces/$USER already exists, but has higher/root owner, such that jupyter cannot do the chown command, and therefore Cowbird will fail any following step since it uses the same UID:GID. Same applies for /data/jupyterhub_user_data/ and /data/jupyterhub_user_data/$USER.

Just a wild guess. The order by which the volumes are created could be the source of the root owner. Since there is a step for jupyter persistence volume creation, it might not play nice with docker-compose configuration that would auto-create volume mount locations (as root) if they do not exist.

The creation is performed by this hook:

https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L147-L152 https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L173

Note that care should be taken with overrides if they play with similar properties: https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L259

mishaschwartz commented 6 months ago

This is the same issue as #392

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/env.local.example#L390-L395

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L126-L129

I don't know why the Dockerspawner decides to create them in that order but that's how it's done consistently.

tlvu commented 6 months ago

I don't know why the Dockerspawner decides to create them in that order but that's how it's done consistently.

I am happy it is consistent. The worst kind of problems are intermittent ones.

But I think the sequence is appropriate. {notebook_dir}/public is the parent dir so it is volume-mounted first. Then {notebook_dir}/public/wps_outputs volume-mount follows because it is the child dir. But since the parent dir is read-only, volume-mount of the child dir errors out because it can not create the mount point. This makes sense.

tlvu commented 6 months ago

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created Otherwise this error in docker logs jupyterhub: [E 2024-01-16 15:30:36.478 JupyterHub user:884] Unhandled error starting lvu's > server: The user lvu's workspace doesn't exist in the workspace directory, but should have been created by Cowbird already.

This looks like the volume mounted as /data/user_workspaces could be owned by root or some other user that the internal jupyter spawner user cannot get sufficient permissions to create the user-specific workspace,

This is a reasonable hint but should not happen since the jupyterhub container runs as root so it can mkdir and chown all the paths it needs before spawning the Jupyterlab server container.

or that /data/user_workspaces/$USER already exists

No, the error happens only when that dir do not exist yet. If I manually create it before spawning the Jupyter server (which is my documented work-around), the error is gone and we can spawn the Jupyter server successfully.

The order by which the volumes are created could be the source of the root owner. Since there is a step for jupyter persistence volume creation.

No, Jupyterhub persistance data-volume is for the sessions tokens only. User data are not data-volume but direct volume-mount from disk.

mishaschwartz commented 6 months ago

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Isn't this just because the webhook action that creates the directory is only triggered when the user is created:

https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/magpie/config.yml.template#L35-L36

And the user is already created so the webhook isn't triggered (see: https://pavics-magpie.readthedocs.io/en/latest/configuration.html#webhook-user-create)

fmigneault commented 6 months ago

This code was added to consider the situation where the user already exists, and no webhook would be triggered. https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L151-L155

I'm not sure why it doesn't resolve the same way as when the directory is manually created.

Could it be that jupyterhub tries to mount the volumes before c.Spawner.pre_spawn_hook gets called? Somewhat counter-intuitive name if so. https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L173

fmigneault commented 6 months ago

Does adding a mkdir here fix it instead of raising? https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L161-L163

tlvu commented 6 months ago

This code was added to consider the situation where the user already exists, and no webhook would be triggered.

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L151-L155

This code (mkdir + chown) was there already before Cowbird was added to the stack and I can confirm it works fine on /data/jupyterhub_user_data/. It is really odd that switching to /data/user_workspaces/ it does not work anymore.

Below old code with existing mkdir + chown: https://github.com/bird-house/birdhouse-deploy/blob/775c3b392813872cb8045be473d6e4b091d52d88/birdhouse/config/jupyterhub/jupyterhub_config.py.template#L53-L60

Is it possible Cowbird volume-mount /data/user_workspaces/ read-only which makes Jupyterhub unable to write to it? This is still weird since Jupyterhub has root access, it should be able to write to any paths it sees.

Does adding a mkdir here fix it instead of raising?

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L161-L163

Or maybe adding a symlink instead, see this comment?

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L119-L120

tlvu commented 6 months ago

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Isn't this just because the webhook action that creates the directory is only triggered when the user is created:

https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/magpie/config.yml.template#L35-L36

And the user is already created so the webhook isn't triggered (see: https://pavics-magpie.readthedocs.io/en/latest/configuration.html#webhook-user-create)

Oh interesting. How does this hook knows to create a new dir or symlink to an existing /data/jupyterhub_user_data/$USER dir?

fmigneault commented 6 months ago

The Magpie Webhook registered to occur on create_user is sent to Cowbird's /webhooks/users endpoint with event created when the action happens (see https://pavics-magpie.readthedocs.io/en/latest/configuration.html#config-webhook-actions for all available Magpie Webhooks and when they trigger). Each active Cowbird handler in https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/cowbird/config.yml.template that implements user_created is then called. For the user-workspace, that happens here: https://github.com/Ouranosinc/cowbird/blob/e2aa5337e32cd87efb5600f3fe62882d8d4d8b1f/cowbird/handlers/impl/filesystem.py#L118

mishaschwartz commented 6 months ago

Does adding a mkdir here fix it instead of raising?

Yes that should solve the problem (when old users were created before cowbird was enabled)

mishaschwartz commented 6 months ago

We can solve the issue of having read-only volumes mounted on top of each other by changing the location of one or the other. I would recommend changing this line:

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/env.local.example#L390

to:

 #public_read_in_container = join(notebook_dir, 'public-shared') 

Or similar.

I also think it would be a good idea to move this code out of env.local.example and into an optional component.

tlvu commented 6 months ago

We can solve the issue of having read-only volumes mounted on top of each other by changing the location of one or the other. I would recommend changing this line:

https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/env.local.example#L390

to:

 #public_read_in_container = join(notebook_dir, 'public-shared') 

Or similar.

Yes, or export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=somethingelse works and it can default to something else than public. Note PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR might already works properly. I just did not have time to confirm.

Same idea, both sharing solutions have their own public folder so they do not step on each other foot.

I also think it would be a good idea to move this code out of env.local.example and into an optional component.

Yes ! At the beginning, I thought about using this as a live example of how env.local can be used to extend JupyterHub config. Retrospectively, it should have been an optional-component because it has been very useful for us, could benefits other.

tlvu commented 6 months ago

Does adding a mkdir here fix it instead of raising?

Yes that should solve the problem (when old users were created before cowbird was enabled)

Should it be creating the dir or the symlink? See comment in code https://github.com/bird-house/birdhouse-deploy/blob/67c6ca1d22c47d9bdf6f6e239f808ef3ec9af0bb/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L119-L120