cylc / cylc-uiserver

A Jupyter Server extension that serves the cylc-ui web application for monitoring and controlling Cylc workflows.
https://cylc.org
GNU General Public License v3.0
15 stars 18 forks source link

Review the need of jupyterhub as a requirement for spawned UI server process #45

Closed kinow closed 4 years ago

kinow commented 5 years ago

When users install cylc-uiserver, there is a jupyterhub dependency that is used to run the jupyterhub command line application. This is called the Hub by some of us.

The Hub then spawns what we call UI Servers. These UI Servers are accessed behind a proxy (configurable-http-proxy, a NodeJS app, by default). The Hub has REST methods ready to read cookie values and to act as an OAuth2 server.

The easiest approach to authenticate the UI Server was by re-using the JupyterHub classes to authenticate the UI Server.

This implies having jupyterhub as a dependency not only in the Hub, but also on each spawned UI Server - well noted by @MartinRyan.

We need to review this, and assess whether it would be possible to authenticate against the Hub without bringing jupyterhub as a dependency in the spawned processes (which brings other transitive dependencies with it).

kinow commented 5 years ago

JupyterHub uses cookies in its web interface, that have some extra levels of security (e.g. good practices for cookies, plus some encryption that the server validates is valid).

It is quite tricky to get it all working fine, but the jupyterhub/jupyterhub GitHub project contains a class HubOAuthCallbackHandler that once imported, can be used as a base class for Tornado handlers, along Tornado decorators (e.g. tornado.web.authenticated) to magically handle authentication.

Once you extend that class, and annotate the methods that need to be protected, the Tornado framework will take care of verifying whether there is a user logged in or not, and if not, the parent HubOAuthCallbackHandler will take care to read the cookie and interact with the REST service to authenticate the user.

We used this approach in our PoC to validate that the Vue.js could authenticate against the Hub, and hence the current dependency.

References:

kinow commented 5 years ago

Why are we using JupyterHub instead of Notebook?

Jupyter Notebook can be used with or without JupyterHub. See comment in one of our old tickets for some comments done during the initial investigation.

But from what I remember, the spawner process (jupyter_singleuser) takes care to initialize the Notebook without authentication. Instead, the authentication is done at the Hub + Spawner.

What we did in Cylc UI Server was to do the authentication at the Hub only. The Spawner is completely unaware of any authentication that needs to be performed after it has spawned, and the UI Server (Tornado app serving vue.js) is simply delegating everything to the hub.

I have never done a complete analysis of how authentication works from point to point between Hub <- Proxy -> Notebook, but there is some code in the jupyterhub code related to spawning the NotebookApp (class used by jupyterhub to initialize the Notebook Application) that indicate that it disables parts of the built-in auth from the notebooks:

So in summary, when we use the Notebook behind JupyterHub, the Hub is still taking care of the authentication. Just the border between where it happens is quite fuzzy, with the Hub initializing the notebook with some hints about auth.

kinow commented 5 years ago

@hjoliver I realized that this is a recurrent topic, and one that we seemed to agree that would have to be improved, but not necessarily now... so created this ticket to hold all conversation about it.

Plus, my memory is as good as a goldfish's, so documented here what I remember about the history of the issue, how we got here, and where the auth is happening.

I think it would be fair to say that this was the simplest (though I would accept laziest here) approach we could find to get everything working together. And I take full responsibility for the dependency in the spawned UI Server tornado app 😬

kinow commented 5 years ago

Was reading the issues in JupyterHub, and found this one again, about using JupyterHub as frontend for multiple apps (I beleive Pangeo is doing that).

The maintainer of JupyterHub, when replying to a question about auth, says

If you happen to be writing a tornado server, jupyterhub provides classes that implement authentication with the Hub, so you mainly need to just import and mix-in the HubOAuthenticated handler class. Otherwise, you'll have to implement OAuth with the JupyterHub URLs.

So I think this reinforces that it is doable, as long as we write OAuth in our Tornado server. I think we also need to access the Cookie information, which is also handled via jupyterhub (or maybe it is available in a environment variable...)

kinow commented 4 years ago

Found out about this project to wrap webapps as replacements for singleuser-app.

Which is what we are doing with Cylc UI Server. Haven't read the code, but checked their requirements.txt, and jupyterhub is a dependency there too. Might be because of OAuth as in our case, which would probably point that it's easier (or better) to rely on re-using the same client/logic as JupyterHub for auth.

Or maybe they had another reason… :man_shrugging: interesting project for comparison with Cylc UIS. We might have something to learn from their approach.

hjoliver commented 4 years ago

@kinow - can you remind me, is there really any problem with having JupyterHub as a dependency on the UIS side? It's just another package that gets imported like any other, and it doesn't even sound entirely unreasonable to me given that cylc-uiserver gets launched by J-Hub. But maybe I've forgotten the point of this old-ish issue.

kinow commented 4 years ago

It was an issue raised by Martin/BoM. I believe their main concern were the transitive deps.

So we have a OAuth client, we import jupyterhub and all of is dependencies.

hjoliver commented 4 years ago

Yes, but why is that a problem? It's just another code library. I guess I'd suggest we close this issue and not bother trying to remove the dependency unless it causes a real problem.

hjoliver commented 4 years ago

(I'm wondering if this issue was originally part of the J-hub related investigations that Martin did, that we decided was unnecessary - we just wanted to use J-Hub out of the box as a generic application launcher, as we have in fact ended up doing).

kinow commented 4 years ago

From what I remember, it was based on the concern of havijg to install jupyterhub and dependencies in each HPC node with UIS, not only in one node with jupyterhub.

In the NeSI presentantion last week, they showed JupyterHub in the DMZ. It is isolated from other nodes. They woulf probably find it strange to have yo install jupyterhub in the internal network (even if not running, any user could in theory run jupyterhub).

I'm fine with closing this one if you prefer. I think I created it to track the discussion that originated either in Riot or in another issue.

hjoliver commented 4 years ago

Yeah I can imagine that would be a concern at first glance - but we wouldn't be relying on the system-installed J-Hub on cylc-uiserver hosts. It would just be installed as another setup.py dependency for us. That wouldn't be dangerous (we wouldn't actually be running a privileged hub there) and besides admins couldn't really stop us doing that anyway could they, without reading the code to see what it all was?

hjoliver commented 4 years ago

OK let's leave this open for the moment, but I suggest we close it if @oliver-sanders and @dpmatthews agree with me that there's no good reason to bust a gut to remove this dependency. It is not "JupyterHub" after all, it is just a code library that cylc-uiserver uses a small part of, which happens to be part of the JupyterHub codebase.

oliver-sanders commented 4 years ago

I don't think the uiserver is going to "grow into" the jupyterhub environment as things stand at the moment so it'll be just as hard to remove this dependency down the line as it is now.

So I'm happy to wait for this to become a problem before we try to solve it.

One potential caveat to that would be if we decided to borrow widgets from Jupyter Labs (text editors, YAML viewers, etc). This would add a labs dependency which may well bring in a hub dependency?

dpmatthews commented 4 years ago

I'm happy to close this issue and re-open if it becomes a problem

hjoliver commented 4 years ago

OK, closing for now.

kinow commented 3 years ago

Note that this is not an issue anymore as we are moving to Jupyter Server, so no more jupyterhub requirement :tada: #230