dask / dask-labextension

JupyterLab extension for Dask
BSD 3-Clause "New" or "Revised" License
313 stars 63 forks source link

Dashboard doesn't show when scheduler does not listen on localhost #233

Open etejedor opened 2 years ago

etejedor commented 2 years ago

I am running a JupyterHub deployment that uses JupyterLab and the Dask lab extension to create HTCondorClusters.

The JupyterHub user sessions run inside Docker containers, and so does the Dask scheduler. The Dask scheduler listens on an address of the private Docker network (not on the loopback interface), so that messages from the workers get forwarded from the host address to the Docker address.

I can create an HTCondorCluster normally via the Dask lab extension, but the dashboard buttons do not show up (see image below).

image

If I look at the requests done by the browser, I see 403 errors when getting e.g. dask/dashboard/a44f0140-dd9e-4f9b-a5c3-acd32682eb70/individual-plots.json?1652878129159. The error message is quite descriptive: Host '172.17.0.2' is not whitelisted. See https://jupyter-server-proxy.readthedocs.io/en/latest/arbitrary-ports-hosts.html for info. -- where 172.17.0.2 is the private Docker address.

So it seems that the jupyter server proxy extension does not allow to proxy to 172.17.0.2. At the link suggested by the error message, they recommend to use the option host_allowlist, to include in this case 172.17.0.2 in a whitelist.

However, if I include the c.ServerProxy.host_allowlistoption in the jupyter notebook config file, it has no effect. The reason is that DaskDashboardHandler does not load that option and it does not invoke its ProxyHandler superclass constructor with that option as an argument. Therefore, the 172.17.0.2 address is not whitelisted.

Shouldn't the DaskDashboardHandler class define a constructor that loads the host_allowlist option and passes it on to ProxyHandler?

Another thing I was trying was to set dashboard_address (in scheduler_options, when building the HTCondorCluster) to localhost:port, so that the dashboard does listen on the loopback interface, differently from the scheduler. That doesn't work either because the Dask lab extension still thinks the dashboard listens on the same address of the scheduler, since this call is resolved by this code, i.e. the dashboard link is constructed by using the address of the scheduler (which looks like a bug, since the addresses of the scheduler and the dashboard can be different as I explained above).

Any feedback would be appreciated! Thanks!

ian-r-rose commented 2 years ago

Hi @etejedor, apologies for the slow reply.

I know that at least a few people have gotten host_allowlist working for private, non-localhost addresses (cf. https://github.com/dask/dask-labextension/issues/194). But even then, it's not very ergonomic, and I think involves hitting the /proxy/<ip-address> endpoint rather than the one constructed by the extension.

It sounds like you have a good understanding of what would help make this work a little better for your use-case. I am not 100% sure that a constructor passing host_allowlist is what's necessary, but then, my memory of traitlets is a little hazy :) We would welcome PRs if you have the time!