Open consideRatio opened 5 years ago
I just learned about: https://jupyter-server-proxy.readthedocs.io/en/latest/index.html - the new nbserverproxy as I understand it.
There is a reference in jupyter-server-proxy's docs:
I did not understand if this was a reference saying that dask-labextension utilizes such functionality, or that dask-labextension depends on jupyter-server-proxy specifically, or nbserverproxy.
/cc: @yuvipanda @ryanlovett
@ian-r-rose helped me to install the extension properly by #40 (:heart:), and I have now tested the merged PRs #31 and #34 (at least I think so, I have version the PyPI package of version 0.3.1
installed). I was able to press "new" to create local clusters, connect to them and such as well. All of this while still acting on a remote Jupyter server provided to me by a zero-to-jupyterhub-k8s deployment made available at https://jupyter.allting.com/user/erik.sundell
.
But, I still run into an issue when I try to connect to a scheduler that I have deployed on the same kubernetes cluster. The pods can access it using their network and a kubernetes service, but my browser cannot. It seems like the browser is attempting to make a lot of requests to the provided URL rather than delegating this to the actual jupyter notebook server as a proxy. This is what I've found so far.
A field is populated with dask/dashboard/7708ccdb-ce0b-4f74-b2db-970a07e94b24
Is there some way I can reference the remote cluster?
I looked into the scheduler logs...
('INFO',
'distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.2.6:44307'),
('INFO', 'distributed.scheduler - INFO - Register tcp://10.0.2.5:32781'),
('INFO',
'distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.2.4:35733'),
('INFO',
'distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.2.5:32781'),
('INFO',
'distributed.scheduler - INFO - Receive client connection: Client-97858002-1098-11e9-80f3-0a580a00013d'),
('INFO',
'distributed.scheduler - INFO - Receive client connection: Client-01d7bab4-1099-11e9-80f3-0a580a00013d'),
('INFO',
'distributed.scheduler - INFO - Receive client connection: Client-e4c69902-109d-11e9-80f3-0a580a00013d'),
('INFO',
'distributed.scheduler - INFO - Receive client connection: Client-f0702c94-109d-11e9-80f3-0a580a00013d')
I learned that I can find the "client id" like this as well:
But using that like i was able to use the id for the cluster created locally in the same pod failed with the same "does not appear to be a valid bla bla"...
I tested requesting a response from the locally created dask cluster by accessing: https://jupyter.allting.com/user/erik.sundell/dask/dashboard/7708ccdb-ce0b-4f74-b2db-970a07e94b24
It worked fine and I got:
{"Individual Task Stream": "/individual-task-stream", "Individual Progress": "/individual-progress", "Individual Graph": "/individual-graph", "Individual Profile": "/individual-profile", "Individual Profile Server": "/individual-profile-server", "Individual Nbytes": "/individual-nbytes", "Individual Nprocessing": "/individual-nprocessing", "Individual Workers": "/individual-workers"}
Accessing the remote cluster clients ip failed though:
https://jupyter.allting.com/user/erik.sundell/dask/dashboard/f0702c94-109d-11e9-80f3-0a580a00013d
But I learned that the local dask clusters client.id was different from the one i utilize to connect with anyhow. It is probably because the client.id is one among many clients to a cluster, and the connection string I see for the cluster must refer to something else than the client.id connecting to it at some later point...
I'd like to see this issue resolved too because I typically start JupyterLab on a remote server. It'd be really nice to access the dashboard through JupyterLab, and not having to setup dashboard access (which I do through port-forwarding).
Hi @consideRatio (and @stsievert), sorry about the extremely late response. As far as I can tell, you are unable to access your remote dashboard due to CORS and/or mixed content errors. These security policies are there for a good reason, especially in an application like JupyterLab: if untrusted content gets access to the page it could trigger arbitrary code execution.
Our approach to this has been to make the cluster manager, which starts the clusters on the server side, and enforces the dashboard URLs (rather than allowing the user to put any URL in the iframes). That being said, there are many different types of deployments, so we are still working out good examples for configuration in these different contexts.
So to fix your current issue, I see a couple of options:
I think my issue is related. I can succesfully start a LocalCluster on a remote server using the extension but the visualisation tabs are blank when opened. I'm running jupyterhub behind nginx.
I don't mean to hijack this issue, so if this is unrelated I can make a separate post. If not, I'd happily accept any configuration suggestions to make this work.
Similar issue - as far as I can tell its not possible to use the extension to connect to a cluster that's started from within a notebook or terminal. From my reading of the nb proxy docs that is by design, am I right? The proxy system can only proxy processes that it starts itself?
@mangecoeur that is correct.
I'm putting together a tutorial right now - adapted from dask/dask-tutorial and focused more on the kind of use-case discussed here (for my purposes, running single-machine Dask via Gigantum on Digital Ocean). I will digest the contents of this issue there, but it seems this isn't quite documented anywhere. Is there a place that it should be documented?
Currently, the only dask-labextension docs are the readme in this repository. If you are willing to write a troubleshooting section, I think that would be helpful.
Is there any solution for this? I would like to run the dask jupyter lab extension on my azure VM.
@georaf I think the steps outlined in https://github.com/dask/dask-labextension/issues/41#issuecomment-491104410 should work. Are you getting errors? The last suggestion was that it might be helpful to add some docs around this, so if that sounds interesting to you, a pull request would be welcome.
This is a little Gigantum-centric, but the approach should be pretty clear. You can just launch the related Project in Gigantum Hub and look at the Environment tab to see how we set it up (links are in the post):
https://blog.gigantum.com/scaling-on-the-cheap-with-dask-gigantum-and-digitalocean
I somehow missed these comments for a LONG time - sorry about that. It's not exactly troubleshooting - more showing an approach that works. If this seems helpful, I'm happy to digest the relevant bits into a README section. I'm also happy to simply link to the blog post.
A simple way to access the scheduler is to use arbitrary host/port access feature of Jupyter ServerProxy. Just enter proxy/daskscheduler:8787
(without forward slash at the beginning, i.e. relative URL) instead of http://daskscheduler:8787
as the URL address. This allows the Dask JupyterLab extension to access the scheduler behind the reverse proxy.
I hoped that the #34 #31 should accomplish something allowing me to use this extension in a way that allowed me to write a scheduler URL that is only accessible from the jupyter server but not from the browser client.
For example, in a terminal in the jupyter server made available through a JupyterHub, I can
wget
tohttp://dask-scheduler.dask:8787
, but from the browser, I cant access that URL. It is only available in the local network of the Jupyter server.Question
Was this supposed to work after #31 #34, or were those PRs not meant to accomplish this as I hoped they were with a lack for a proper understanding.