bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

:bulb: [Feature] Protect JupyterHub #334

Open fmigneault opened 1 year ago

fmigneault commented 1 year ago

Description

Ensure JupyterHub is accessed behind Magpie/Twticher authentication/authorization. Currently, it uses Magpie/Twticher after the fact to login the user, but the initial request and subsequent ones are not protected by default. Therefore, any user can still reach the entrypoints (though they cannot login).

References

mishaschwartz commented 1 year ago

If we put all jupyterhub routes behind magpie/twitcher then we also have the opportunity to allow users who have already signed in with magpie to not have to sign in again with jupyterhub.

The advantage of this is that users will have a single place to sign in. However it will change the user experience in the following ways:

Another option is to keep the jupyterhub sign in page as the default but also have the jupyterhub sign in trigger the magpie sign in. This would avoid the situation where a user is signed in with jupyterhub but is not signed in with magpie which could mean that they are not authorized to access certain routes even though they have signed in from the user's perspective.

My vote would be to go with option 2 and have the jupyterhub sign in trigger the magpie sign in. This changes the least number of things from a user's perspective and keeps the juptyerhub sign in page as the "default" sign in location.

huard commented 1 year ago

I suspect that using the Magpie sign-in page could have long-term advantages, as for example displaying the current permission profile for data, services, other daccs nodes, etc.

@tlogan2000 Other thoughts regarding this ?

tlogan2000 commented 1 year ago

I suppose my quick thought would be to agree with @huard but certain changes need to be made to the magpie page in my opinion ... Currently magpie signin is very much aligned to an admin user (options for user, service perms etc.) visible even if unaccessable after login : see https://pavics.ouranos.ca/magpie/. The basic user would have to see something like 'Please sign in here' then a 'take me to the Jupyterlab ' type navigation. Possibily an account info / change password option

mishaschwartz commented 1 year ago

@huard @tlogan2000 That all makes sense to me. Just to clarify what we're proposing:

huard commented 1 year ago

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

Also, could we "re-brand" the magpie sign-in page ? I'm concerned if people see Magpie and its logo, they'll think they left the DACCS node.

mishaschwartz commented 1 year ago

@huard

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

Yes that should be possible

Also, could we "re-brand" the magpie sign-in page ? I'm concerned if people see Magpie and its logo, they'll think they left the DACCS node.

Another option is to leave magpie alone and create a separate "DACCS branded" sign in page that makes a call to the magpie api (see: https://pavics-magpie.readthedocs.io/en/latest/api.html#tag/Session%2Fpaths%2F~1signin%2Fpost). That way we don't have to force magpie to be something it isn't.

huard commented 1 year ago

I like the API call idea. Good for me. @tlvu Thoughts ?

fmigneault commented 1 year ago

I also agree with the option 2 (keep the jupyterhub sign in page as the default but also have the jupyterhub sign in trigger the magpie sign in). I think it is much easier to use JupyterHub as the entrypoint, because even if we login on Magpie first, we would not have the JupyterHub session ID and session cookie...

When JupyterHub does the login, it performs some Magpie login using this: https://github.com/bird-house/birdhouse-deploy/blob/22fc7c01a0279b2c98a9059b874dee5922bcac21/birdhouse/config/jupyterhub/jupyterhub_config.py.template#L17 Can't it return the Magpie Cookie at the same time if not already set? Can't it do a pre-check that a Cookie matching Magpie's definition is already present and valid? It should be sufficient to send a request to https://pavics.ouranos.ca/magpie/session with the detected cookies to validate if the user is already logged in in Magpie, and that login did not expire, to skip the login on the JupyterHub side.

Where is even that jupyterhub_magpie_authenticator.MagpieAuthenticator implementation?

certain changes need to be made to the magpie page

Note that modifying Magpie to have access to JupyterHub or other useful links is not that straightforward. Magpie is not used exclusively in birdhouse, and JupyterHub makes no sense in other platforms. Therefore, it would need some kind of templating HTML to dynamically add extra contents to be displayed by specific platform overrides. The logic and utilities for such templating is already in Magpie (for the notification/registration emails), but would have to be added to that UI page. Also, the JupyterHub Session ID/Cookie would still be missing, so the user would still have to login again on the Jupyter side anyway, unless this template basically reimplements what jupyterhub_magpie_authenticator.MagpieAuthenticator does.

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

That would require some callback-URL to be specified. Magpie can already do something like that when handling "external provider" logins to return to the signin page after resolving the external login, but I'm not certain (pretty sure it won't) work out of the box for internal logins.

mishaschwartz commented 1 year ago

@fmigneault

Note that modifying Magpie to have access to JupyterHub or other useful links is not that straightforward. Magpie is not used exclusively in birdhouse, and JupyterHub makes no sense in other platforms.

I agree which is why I sugested:

Another option is to leave magpie alone and create a separate "DACCS branded" sign in page that makes a call to the magpie api (see: https://pavics-magpie.readthedocs.io/en/latest/api.html#tag/Session%2Fpaths%2F~1signin%2Fpost). That way we don't have to force magpie to be something it isn't.

how do you feel about this suggestion?

Also... The jupyterhub_magpie_authenticator.MagpieAuthenticator can easily be modified to automatically log in users through jupyter if they're already logged in through magpie. If you're interested I've been experimenting with this on the jupyter-behind-twitcher branch, if you have a look at this file it'll give you an idea of a possible solution:

https://github.com/bird-house/birdhouse-deploy/blob/jupyter-behind-twitcher/birdhouse/config/jupyterhub/config/magpie/authenticator/jupyterhub_magpie_authenticator.py

tlvu commented 1 year ago

When JupyterHub does the login, it performs some Magpie login using this:

https://github.com/bird-house/birdhouse-deploy/blob/22fc7c01a0279b2c98a9059b874dee5922bcac21/birdhouse/config/jupyterhub/jupyterhub_config.py.template#L17

Can't it return the Magpie Cookie at the same time if not already set? Can't it do a pre-check that a Cookie matching Magpie's definition is already present and valid? It should be sufficient to send a request to https://pavics.ouranos.ca/magpie/session with the detected cookies to validate if the user is already logged in in Magpie, and that login did not expire, to skip the login on the JupyterHub side.

Agreed.

Where is even that jupyterhub_magpie_authenticator.MagpieAuthenticator implementation?

It is here https://github.com/Ouranosinc/jupyterhub/blob/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py, from David Caron. Is he still at CRIM? I forgot.

I think if we can combine the idea using Magpie API from Misha with the idea to make the Jupyterhub Magpie authenticator also detect and set Magpie cookies, possibly using this Magpie API?

I agree keeping the Jupyterhub login page is a nicer experience as the Magpie login page is mostly for admin users. As for creating a brand new DACCS login page, then this DACCS login page will have to deal with both Jupyterhub and Magpie sessions cookies. Not sure if this is easier than making the existing Jupyterhub Magpie authenticator play nicer with Magpie sessions.

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

Just to be clear, I see the advantage for better integrating the Jupyterhub session with Magpie session to offer a single sign-on experience. I think we can achieve this without having to route all Jupyterhub traffic behind Twitcher. As such, probably a better title of this issue would be "Allow single sing-on between Jupyterhub and Magpie" instead of "Project JupyterHub".

mishaschwartz commented 1 year ago

@tlvu

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

By putting it behind magpie/twitcher it would allow us to use magpie to specify permissions on a more fine-grained level (we can allow access to jupyterhub for specific groups of users for example).

Right now, the MagpieAuthenticator simply checks if a users exists in magpie in order to allow them access. This would give us more flexibility.

mishaschwartz commented 1 year ago

@tlvu

As for creating a brand new DACCS login page, then this DACCS login page will have to deal with both Jupyterhub and Magpie sessions cookies

Not necessarily, we could imagine a workflow like this:

Essentially this means that magpie becomes the "source of truth" for whether a user is logged in or not and other components (custom login page, jupyterhub) just have to interact with magpie, not with each other.

tlvu commented 1 year ago

@tlvu

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

By putting it behind magpie/twitcher it would allow us to use magpie to specify permissions on a more fine-grained level (we can allow access to jupyterhub for specific groups of users for example).

Right now, the MagpieAuthenticator simply checks if a users exists in magpie in order to allow them access. This would give us more flexibility.

I see, this is now much clearer. We are basically missing both single sign-on and fine grained permissions, like for other WPS services.

If the Magpie API can also provide this group membership information, can the MagpieAuthenticator use this for finer grained permission?

Not necessarily, we could imagine a workflow like this:

* user goes to custom login page and signs in

* a POST request gets sent to magpie to log in the user and sets the magpie cookies (if successful)

* the user then goes to the jupyterhub page

* jupyterhub checks if the user is already logged in through magpie and automatically sets the jupyterhub cookies (see my comment here for details

But here the user is logged into Magpie, but does it have permissions to access JupyterHub? Looks like the same problem as with MagpieAuthenticator?

Essentially this means that magpie becomes the "source of truth" for whether a user is logged in or not and other components (custom login page, jupyterhub) just have to interact with magpie, not with each other.

Currently this is already the case I think. Thredds and all WPS are behind Twitcher/Magpie. Jupyterhub login uses Magpie users.

If I summarize, we want

I have a feeling all changes can be done at the MagpieAuthenticator level but if a separate DACCS login page is easier then I have no objections. Is this DACCS login page a static page or another app to be deployed?

fmigneault commented 1 year ago

@mishaschwartz

how do you feel about this suggestion?

I'm not sure if this only moves the problem to another service to keep in sync. The user could still log in with Magpie or JupyterHub. Long seems to have also identified this issue.

@tlvu

David Caron. Is he still at CRIM?

No. Gone for quite a while now.

Since even the custom DACCS login approach would require that the JupyterHub handler checks if the user is already logged in through Magpie and automatically sets the JupyterHub cookies, I think it is just easier to keep JupyterHub as the main login location and leave it up to the handler to sync items as needed.

fmigneault commented 1 year ago

@mishaschwartz https://github.com/bird-house/birdhouse-deploy/compare/master...jupyter-behind-twitcher looks promising. If it can combine the authenticate method from https://github.com/Ouranosinc/jupyterhub/blob/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py and a definition of https://jupyterhub.readthedocs.io/en/stable/reference/api/auth.html#jupyterhub.auth.Authenticator.check_allowed, that should cover most cases.

Another interesting alternative: https://jupyterhub.readthedocs.io/en/stable/reference/api/auth.html#jupyterhub.auth.Authenticator.login_service

fmigneault commented 1 year ago

A few other places where things can maybe be overridden to inject extra cookies:

https://github.com/jupyterhub/jupyterhub/blob/0e4deec714a30729d11e7d0b1ce359f65faaf6af/jupyterhub/handlers/base.py#L826-L831

https://github.com/jupyterhub/jupyterhub/blob/0e4deec714a30729d11e7d0b1ce359f65faaf6af/jupyterhub/handlers/base.py#L563-L644

mishaschwartz commented 1 year ago

I'm not sure if this only moves the problem to another service to keep in sync. The user could still log in with Magpie or JupyterHub.

We wouldn't have to keep another service in sync. We're still only ever logging in through magpie. We're just adding a static page that we can customize so that we don't have to modify any of the magpie code. Think of it as simply replacing the look of the magpie login page (without actually changing the magpie code).

If it can combine the authenticate method from https://github.com/Ouranosinc/jupyterhub/blob/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py and a definition of https://jupyterhub.readthedocs.io/en/stable/reference/api/auth.html#jupyterhub.auth.Authenticator.check_allowed, that should cover most cases.

That's a great idea to combine those

I think it is just easier to keep JupyterHub as the main login location and leave it up to the handler to sync items as needed.

Yes it is easier @fmigneault, but @huard's point:

I suspect that using the Magpie sign-in page could have long-term advantages, as for example displaying the current permission profile for data, services, other daccs nodes, etc.

does have a lot of advantages for DACCS specifically (even if the advantages to CRIM's use of birdhouse-deploy are not as clear)

mishaschwartz commented 1 year ago

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

  1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub
  2. create another PR that creates a separate optional component that implements the updated MagpieAuthenticator (described in the jupyter-behind-twitcher) as well as creating a customizable static login page (as described above).

Then, you can choose to have jupyterhub as your main login page or you can choose to enable this optional component to have a customizable login page.

I would prioritize step 1 in order to resolve this issue and then we can work on step 2 at a later date.

tlvu commented 1 year ago

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub

Good first step in achieving single sign-on between JupyterHub and Magpie.

I guess what you propose, user already logged into JupyterHub will not need the login again for Magpie but maybe not the other way around?

It's okay since this is the first step. We can implement the reverse scenario in subsequent steps.

2. create another PR that creates a separate optional component that implements the updated MagpieAuthenticator (described in the jupyter-behind-twitcher) as well as creating a customizable static login page (as described above).

Then, you can choose to have jupyterhub as your main login page or you can choose to enable this optional component to have a customizable login page.

I would prioritize step 1 in order to resolve this issue and then we can work on step 2 at a later date.

I like this flexibility to let user decide if JupyterHub or another login page is preferred.

tlvu commented 1 year ago

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub

Just to be clear, all routes behind Twitcher means all data flows through Twitcher or simply the "verify" trick so data do not flow through Twitcher to avoid performance penalty?

mishaschwartz commented 1 year ago

Just to be clear, all routes behind Twitcher means all data flows through Twitcher

Not this one

or simply the "verify" trick so data do not flow through Twitcher to avoid performance penalty?

Yes this one