bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

:bug: [BUG]: jupyterlab server fails to spawn due to read-only volume mount #392

Open mishaschwartz opened 11 months ago

mishaschwartz commented 11 months ago

Summary

The jupyterlab server fails spawn when cowbird settings are enabled that mount the public/wps_outputs directory.

Details

A new jupyterlab container will try to mount to the /notebook_dir/public/wps_outputs directory in the jupyterlab container. Docker complains that it cannot mount to that location.

Possibly because it is a read-only bind-mount and the mount location is a nested directory that does not exist on the container (ie. it needs to create /notebook_dir/public before it creates /notebook_dir/public/wps_outputs and it may be creating /notebook_dir/public as read-only as well).

Traceback (in jupyterhub container):

    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/dist-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 1304, in start
        await self.start_object()
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 1162, in start_object
        await self.docker("start", self.container_id)
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 948, in _docker
        return m(*args, **kwargs)
      File "/usr/local/lib/python3.10/dist-packages/docker/utils/decorators.py", line 19, in wrapped
        return f(self, resource_id, *args, **kwargs)
      File "/usr/local/lib/python3.10/dist-packages/docker/api/container.py", line 1127, in start
        self._raise_for_status(res)
      File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 270, in _raise_for_status
        raise create_api_error_from_http_exception(e) from e
      File "/usr/local/lib/python3.10/dist-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
        raise cls(e, response=response, explanation=explanation) from e
    docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.43/containers/4cea3661f3131dbccb662a9eda2b0e49f8e06a7435ef64680ab510b6d5aeab18/start: Internal Server Error ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/data/user_workspaces/public/wps_outputs" to rootfs at "/notebook_dir/public/wps_outputs": mkdir /var/lib/docker/overlay2/83b35d1bc9c7db553a0392f0deb855ccc7057e7a52025360be859cb9402d4894/merged/notebook_dir/public/wps_outputs: read-only file system: unknown")

docker version: Docker version 24.0.2, build cb74dfc

Note that this problem goes away if we set the PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR variable to a non-nested directory:

# env.local
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=public-wps-outputs

To Reproduce

Steps to reproduce the behavior:

  1. start birdhouse deploy with the cowbird and jupyterhub components enabled
  2. log in to jupyterhub and spawn a new jupyterlab server
  3. inspect the jupyterhub logs for the error message: docker logs -f jupyterhub

Environment

Information Value
Server/Platform URL daccs.cs.toronto.edu
Version Tag/Commit 1.35.0
Related issues/PR
Related components jupyterhub, cowbird
Custom configuration
docker version Docker version 24.0.2, build cb74dfc

Concerned Organizations

fmigneault commented 11 months ago

@ChaamC @Nazim-crim Probably related to DAC-584 - Error spawning jupyter notebook images.

tlvu commented 11 months ago

I am guessing /notebook_dir/public/ is read-only and possibly causing problems because this sample config has been enabled?

https://github.com/bird-house/birdhouse-deploy/blob/2b9f31e977740dec060747b03be6eca2d9548f1b/birdhouse/env.local.example#L352-L400

This sample config was our poor-man sharing solution between Jupyter users before Cowbird exists so maybe Cowbird can replace that? So maybe we do not have to enable that sharing solution together with Cowbird so they won't clash with each other?

If we need to keep both sharing mechanism and if they actually clash with each other, I think it would be better for Cowbird to bind to /notebook_dir/public-wps-outputs/ so how about changing the default value to avoid surprise for future users?

Just curious about the Cowbird sharing workflow.

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.

With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

Nazim-crim commented 11 months ago

@fmigneault, the spawning error in DAC-584 - Error spawning jupyter notebook images is related to the mount of the jupyterlab google-drive extension in .jupyter. Docker automatically set the user to root when you mount a volume on a directory that does not exist on the image. https://github.com/jupyterhub/dockerspawner/issues/453 . Regarding this bug, I think @tlvu is right and it's because a mount is made to a nested directory with the parent being ro.

fmigneault commented 11 months ago

@tlvu

We would need to adjust the sample config when using Cowbird. The /notebook_dir/public/ location, along /wps_outputs/public, should be mounted together under ~/public/ for easy access by the user in the spawned docker. Extra /data/jupyterhub_user_data/public-share could be added in there as well if needed. We just need to establish how all these directories should be combined under ~/public/ in the docker.

The general structure for WPS outputs is as follows:

/data/wps_outputs/
   <bird-wps>/
       <output-files>
   weaver/
       public/
           <jobID>/
               <output-files>
       users/
            <user_id>/
               <jobID>/
                   <output-files>

Cowbird understands that WPS-output structure and aligns permissions on the /wpsoutputs endpoint with corresponding files. When the notebook is started with Cowbird support adding hardlinks, only the public and user-specific WPS outputs are mounted in respective locations that indicate that they are "public" or "my-outputs".

All WPS outputs volumes are purposely mounted with ro since allowing modification to their contents would mean their process results would not guaranteed to be valid anymore (anyone could have modified or deleted them).

Nazim-crim commented 10 months ago

@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned. export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs" . Do you have more steps to reproduce it other than adding ./components/cowbird in EXTRA_CONF_DIRS and the change in env.local?

tlvu commented 10 months ago

@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned. export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs" . Do you have more steps to reproduce it other than adding ./components/cowbird in EXTRA_CONF_DIRS and the change in env.local?

@Nazim-crim I have not tried to reproduce. Are you saying @mishaschwartz and you end up with different result when trying to reproduce this? This is odd ! Maybe I should try to reproduce myself.

Since I have your attention, how does this workflow work "With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?"

tlvu commented 10 months ago

mounted together under ~/public/

@fmigneault is this inside each jupyter container? This is new to me.

Previously, I understood any new mounts inside jupyter containers should be under /notebook_dir/ since this is the the root dir visible in the left panel of the Jupyter env.

If a new mount is at ~/public/, the user will never see it visually and can only access it via the terminal or code. I that intended to hide it visually?

fmigneault commented 10 months ago

@tlvu My bad if it wasn't clear. The ~ I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the /notebook_dir/ you mention.

tlvu commented 10 months ago

@tlvu My bad if it wasn't clear. The ~ I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the /notebook_dir/ you mention.

@fmigneault then I am more confused by your comment https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1769582602

Where does /notebook_dir/public/ and /wps_outputs/public should appear in

/data/wps_outputs/
   <bird-wps>/
       <output-files>
   weaver/
       public/
           <jobID>/
               <output-files>
       users/
            <user_id>/
               <jobID>/
                   <output-files>
mishaschwartz commented 10 months ago

@Nazim-crim

To reproduce the issue:

fmigneault commented 10 months ago

@tlvu /notebook_dir/public/ is populated by Cowbird using a combination of sources including /data/wps_outputs/<bird-wps>, /data/wps_outputs/public and /data/wps_outputs/weaver/public. They are not added "blindly". Cowbird checks with Magpie if those locations are marked public (or rather, are not restricted by https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/secure-data-proxy), and adds the necessary hardlink if they are permitted for anonymous.

The logic was added to handle these combinations for backward compatibility of the existing WPS outputs data structure that assumed a lot of items were fully open.

tlvu commented 7 months ago

FYI, I was able to reproduce this problem as well, while trying to test https://github.com/bird-house/birdhouse-deploy/pull/415.

However, I think it was fixed by https://github.com/bird-house/birdhouse-deploy/pull/401 because when I set export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="publiccowbird", then ./pavics-compose.sh up -d so the jupyterhub container is re-created, I am now able to start the Jupyterlab server.

@mishaschwartz I let you close this issue to confirm weather the fix is fully working.

This allows the Jupyterlab server to start. However I have not confirmed this variable to fully respected by Cowbird and/or Weaver and whether they can function properly with this variable changed from its public default value.

Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under /notebook_dir/writable-workspace has disappeared. This is because without Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/jupyterhub_user_data/$USER, but with Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/user_workspaces/$USER. And by the way I had to manually create /data/user_workspaces/$USER, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.

tlvu commented 7 months ago

Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under /notebook_dir/writable-workspace has disappeared. This is because without Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/jupyterhub_user_data/$USER, but with Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/user_workspaces/$USER. And by the way I had to manually create /data/user_workspaces/$USER, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.

https://github.com/bird-house/birdhouse-deploy/issues/425

tlvu commented 7 months ago

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.

With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

@fmigneault @Nazim-crim @ChaamC Not sure if you guys notice my question above since comment https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1767566197

fmigneault commented 6 months ago

For Cowbird, it is harder to tell easily if PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR changes are effective. Adding anonymous group permissions to files under a given user-workspace should trigger Magpie/Cowbird webhooks that would lead to hardlink creation to share the corresponding files publicly. Files accessible under /wpsoutptus only for a specific user should then gradually become accessible when not logged in.

For Weaver, you should see the public part of the job result URL become publiccowbird when submitting a job because of the hook:
https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/weaver/config/magpie/weaver_hooks.py.template#L61

You can see this Job URL in the last cell output from: https://github.com/Ouranosinc/pavics-sdi/blob/master/docs/source/notebook-components/weaver_example.ipynb

fmigneault commented 6 months ago

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user. With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

The public directories are intentionally open for anyone, as they are attributed Magpie anonymous group.

A user-protected wps-output should have a form similar to: https://pavics.ouranos.ca/wpsoutputs/weaver/user/THE_USER/91b62b44-fb06-4be9-ad2b-43d5265d0048/output/some-file.txt

Using secure-data-proxy, you should get a Magpie structure as follows: image

Using the same structure as defined here: https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1778320614 (need to create child resources for the user-specific structure that is desired), you can set individual user/group permissions to each specific sub-dir/file. When user/group permissions are created in Magpie, this will trigger Webhooks, ie: https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/magpie/config.yml.template#L34

Cowbird will receive these Webhooks to perform various operations, such as creating handlinks to make corresponding files "visible" by users. The tricky aspect of all this is that the files that you see in the user-workspace do not themselves get attributed Magpie permissions. There is no Magpie "user-workspace" service. Instead, files placed in the workspace are mapped to corresponding services where they originate from. Therefore, for WPS-outputs, that are accessed via the /wpsoutputs of proxy service, the Magpie "REST API" secure-data-proxy permissions are used. For shapefiles coming from GeoServer, permissions under geoserver are used, as so on.

The "mapping" of service-specific permissions to corresponding user-workspaces contents depends on https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/cowbird/config.yml.template#L43-L91

And the FileSystem handler that uses the on_created, on_modified, on_deleted, permission_created, permission_deleted events triggered by either Magpie permissions webhooks or file-system monitoring of user-workspaces. https://github.com/Ouranosinc/cowbird/blob/e2aa5337e32cd87efb5600f3fe62882d8d4d8b1f/cowbird/handlers/impl/filesystem.py#L226

Currently, users cannot themselves set permissions for their user-workspace files unless they have Magpie admin privileges. Magpie could employ user-context requests (such as when a user edits their own profile /magpie/ui/users/current) to allow sharing their own files. However, it is tricky to display a partial resource hierarchy without leaking resources of other users (this is why admin-only API/UI are used for now). There is a concept of "owner" (in the DB) for Magpie resources, but they are not currently employed to check access to them. Non-trivial adjustments (new UI pages, new API endpoints) to support user-owned permissions editing would have to be made in Magpie. Relates to https://github.com/Ouranosinc/Magpie/issues/170

mishaschwartz commented 6 months ago

Discussion continues here:

https://github.com/bird-house/birdhouse-deploy/issues/425#issuecomment-1964525883