Open mishaschwartz opened 1 year ago
@ChaamC @Nazim-crim Probably related to DAC-584 - Error spawning jupyter notebook images.
I am guessing /notebook_dir/public/
is read-only and possibly causing problems because this sample config has been enabled?
This sample config was our poor-man sharing solution between Jupyter users before Cowbird exists so maybe Cowbird can replace that? So maybe we do not have to enable that sharing solution together with Cowbird so they won't clash with each other?
If we need to keep both sharing mechanism and if they actually clash with each other, I think it would be better for Cowbird to bind to /notebook_dir/public-wps-outputs/
so how about changing the default value to avoid surprise for future users?
Just curious about the Cowbird sharing workflow.
Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.
With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?
@fmigneault, the spawning error in DAC-584 - Error spawning jupyter notebook images is related to the mount of the jupyterlab
google-drive extension in .jupyter
. Docker automatically set the user to root when you mount a volume on a directory that does not exist on the image. https://github.com/jupyterhub/dockerspawner/issues/453 . Regarding this bug, I think @tlvu is right and it's because a mount is made to a nested directory with the parent being ro
.
@tlvu
We would need to adjust the sample config when using Cowbird.
The /notebook_dir/public/
location, along /wps_outputs/public
, should be mounted together under ~/public/
for easy access by the user in the spawned docker. Extra /data/jupyterhub_user_data/public-share
could be added in there as well if needed. We just need to establish how all these directories should be combined under ~/public/
in the docker.
The general structure for WPS outputs is as follows:
/data/wps_outputs/
<bird-wps>/
<output-files>
weaver/
public/
<jobID>/
<output-files>
users/
<user_id>/
<jobID>/
<output-files>
Cowbird understands that WPS-output structure and aligns permissions on the /wpsoutputs
endpoint with corresponding files.
When the notebook is started with Cowbird support adding hardlinks, only the public and user-specific WPS outputs are mounted in respective locations that indicate that they are "public" or "my-outputs".
All WPS outputs volumes are purposely mounted with ro
since allowing modification to their contents would mean their process results would not guaranteed to be valid anymore (anyone could have modified or deleted them).
@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned. export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs"
. Do you have more steps to reproduce it other than adding ./components/cowbird
in EXTRA_CONF_DIRS
and the change in env.local
?
@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned.
export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs"
. Do you have more steps to reproduce it other than adding./components/cowbird
inEXTRA_CONF_DIRS
and the change inenv.local
?
@Nazim-crim I have not tried to reproduce. Are you saying @mishaschwartz and you end up with different result when trying to reproduce this? This is odd ! Maybe I should try to reproduce myself.
Since I have your attention, how does this workflow work "With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?"
mounted together under
~/public/
@fmigneault is this inside each jupyter container? This is new to me.
Previously, I understood any new mounts inside jupyter containers should be under /notebook_dir/
since this is the the root dir visible in the left panel of the Jupyter env.
If a new mount is at ~/public/
, the user will never see it visually and can only access it via the terminal or code. I that intended to hide it visually?
@tlvu
My bad if it wasn't clear. The ~
I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the /notebook_dir/
you mention.
@tlvu My bad if it wasn't clear. The
~
I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the/notebook_dir/
you mention.
@fmigneault then I am more confused by your comment https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1769582602
Where does /notebook_dir/public/
and /wps_outputs/public
should appear in
/data/wps_outputs/
<bird-wps>/
<output-files>
weaver/
public/
<jobID>/
<output-files>
users/
<user_id>/
<jobID>/
<output-files>
@Nazim-crim
To reproduce the issue:
JUPYTERHUB_CONFIG_OVERRIDE
settings are enabled by uncommenting them in env.local
: https://github.com/bird-house/birdhouse-deploy/blob/1.35.2/birdhouse/env.local.example#L363-L368PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
: https://github.com/bird-house/birdhouse-deploy/blob/1.35.2/birdhouse/components/cowbird/default.env#L55docker logs -f jupyterhub
. You should see a traceback similar to the one in the description (above) @tlvu
/notebook_dir/public/
is populated by Cowbird using a combination of sources including /data/wps_outputs/<bird-wps>
, /data/wps_outputs/public
and /data/wps_outputs/weaver/public
. They are not added "blindly". Cowbird checks with Magpie if those locations are marked public (or rather, are not restricted by https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/secure-data-proxy), and adds the necessary hardlink if they are permitted for anonymous.
The logic was added to handle these combinations for backward compatibility of the existing WPS outputs data structure that assumed a lot of items were fully open.
FYI, I was able to reproduce this problem as well, while trying to test https://github.com/bird-house/birdhouse-deploy/pull/415.
However, I think it was fixed by https://github.com/bird-house/birdhouse-deploy/pull/401 because when I set export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="publiccowbird"
, then ./pavics-compose.sh up -d
so the jupyterhub
container is re-created, I am now able to start the Jupyterlab server.
@mishaschwartz I let you close this issue to confirm weather the fix is fully working.
This allows the Jupyterlab server to start. However I have not confirmed this variable to fully respected by Cowbird and/or Weaver and whether they can function properly with this variable changed from its public
default value.
Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under /notebook_dir/writable-workspace
has disappeared. This is because without Cowbird enabled, /notebook_dir/writable-workspace
is binded to /data/jupyterhub_user_data/$USER
, but with Cowbird enabled, /notebook_dir/writable-workspace
is binded to /data/user_workspaces/$USER
. And by the way I had to manually create /data/user_workspaces/$USER
, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.
Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under
/notebook_dir/writable-workspace
has disappeared. This is because without Cowbird enabled,/notebook_dir/writable-workspace
is binded to/data/jupyterhub_user_data/$USER
, but with Cowbird enabled,/notebook_dir/writable-workspace
is binded to/data/user_workspaces/$USER
. And by the way I had to manually create/data/user_workspaces/$USER
, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.
Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.
With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?
@fmigneault @Nazim-crim @ChaamC Not sure if you guys notice my question above since comment https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1767566197
For Cowbird, it is harder to tell easily if PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
changes are effective. Adding anonymous group permissions to files under a given user-workspace should trigger Magpie/Cowbird webhooks that would lead to hardlink creation to share the corresponding files publicly. Files accessible under /wpsoutptus
only for a specific user should then gradually become accessible when not logged in.
For Weaver, you should see the public
part of the job result URL become publiccowbird
when submitting a job because of the hook:
https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/weaver/config/magpie/weaver_hooks.py.template#L61
You can see this Job URL in the last cell output from: https://github.com/Ouranosinc/pavics-sdi/blob/master/docs/source/notebook-components/weaver_example.ipynb
Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user. With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?
The public
directories are intentionally open for anyone, as they are attributed Magpie anonymous group.
A user-protected wps-output should have a form similar to: https://pavics.ouranos.ca/wpsoutputs/weaver/user/THE_USER/91b62b44-fb06-4be9-ad2b-43d5265d0048/output/some-file.txt
Using secure-data-proxy
, you should get a Magpie structure as follows:
Using the same structure as defined here: https://github.com/bird-house/birdhouse-deploy/issues/392#issuecomment-1778320614 (need to create child resources for the user-specific structure that is desired), you can set individual user/group permissions to each specific sub-dir/file. When user/group permissions are created in Magpie, this will trigger Webhooks, ie: https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/magpie/config.yml.template#L34
Cowbird will receive these Webhooks to perform various operations, such as creating handlinks to make corresponding files "visible" by users. The tricky aspect of all this is that the files that you see in the user-workspace do not themselves get attributed Magpie permissions. There is no Magpie "user-workspace" service. Instead, files placed in the workspace are mapped to corresponding services where they originate from. Therefore, for WPS-outputs, that are accessed via the /wpsoutputs
of proxy
service, the Magpie "REST API" secure-data-proxy
permissions are used. For shapefiles coming from GeoServer, permissions under geoserver
are used, as so on.
The "mapping" of service-specific permissions to corresponding user-workspaces contents depends on https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/cowbird/config.yml.template#L43-L91
And the FileSystem
handler that uses the on_created
, on_modified
, on_deleted
, permission_created
, permission_deleted
events triggered by either Magpie permissions webhooks or file-system monitoring of user-workspaces.
https://github.com/Ouranosinc/cowbird/blob/e2aa5337e32cd87efb5600f3fe62882d8d4d8b1f/cowbird/handlers/impl/filesystem.py#L226
Currently, users cannot themselves set permissions for their user-workspace files unless they have Magpie admin privileges.
Magpie could employ user-context requests (such as when a user edits their own profile /magpie/ui/users/current
) to allow sharing their own files. However, it is tricky to display a partial resource hierarchy without leaking resources of other users (this is why admin-only API/UI are used for now). There is a concept of "owner" (in the DB) for Magpie resources, but they are not currently employed to check access to them. Non-trivial adjustments (new UI pages, new API endpoints) to support user-owned permissions editing would have to be made in Magpie.
Relates to https://github.com/Ouranosinc/Magpie/issues/170
Discussion continues here:
https://github.com/bird-house/birdhouse-deploy/issues/425#issuecomment-1964525883
Summary
The jupyterlab server fails spawn when cowbird settings are enabled that mount the
public/wps_outputs
directory.Details
A new jupyterlab container will try to mount to the
/notebook_dir/public/wps_outputs
directory in the jupyterlab container. Docker complains that it cannot mount to that location.Possibly because it is a read-only bind-mount and the mount location is a nested directory that does not exist on the container (ie. it needs to create
/notebook_dir/public
before it creates/notebook_dir/public/wps_outputs
and it may be creating/notebook_dir/public
as read-only as well).Traceback (in jupyterhub container):
docker version: Docker version 24.0.2, build cb74dfc
Note that this problem goes away if we set the
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
variable to a non-nested directory:To Reproduce
Steps to reproduce the behavior:
docker logs -f jupyterhub
Environment
Concerned Organizations