E-CAM / jobqueue_features

This library provides some useful decorators for dask_jobqueue. It also expands it's scope to include MPI workloads, including extending configuration options for such workloads and heterogeneous resources.
6 stars 3 forks source link

Dask widgets not working in tutorial #104

Closed dwhswenson closed 3 years ago

dwhswenson commented 3 years ago

I see nothing happening in the Dask widgets. Also, if I open the Dask tab on the left (see screen shot), none of the buttons work (I seem to remember those are supposed to be orange, and clicking one should toggle the associated window -- the fact that they are grey suggests to me a communication error.) I also thought that those Dask widgets were supposed to have custom behavior on right-click (the way the notebook does) that included a "refresh frame" menu option. I only get the default Chrome menu.

Is it possible that #98 broke this? Or has something just gone wonky in my configuration? (Tried a restart; that didn't fix.)

image

Steps to reproduce:

  1. From the jobqueue-features/tutorial/ directory:
    stop_slurm && clean_slurm
    source jupyter.sh
    start_tutorial
  2. Get a coffee while everything downloads.
  3. Click on the link, run the notebooks.

Tried on macOS 10.15.7; Google Chrome 88.0.4324.146 and Safari 14.0.

AdamWlodarczyk commented 3 years ago

@dwhswenson Did you click the magnifier icon (just on the righr of search field with "Dask dashboard url" placeholder)? It is responsible for discovering clusters and after that the buttons are oragne/active.

dwhswenson commented 3 years ago

🤦 No, I hadn't done that. (I kept using the refresh clusters thing instead.) However, I'm still having problems. The widgets never load: on Chrome they eventually give the Chrome timeout error (sad document with "[ip address] took too long to respond"); Safari doesn't seem to time out, but the tabs just stay in the white background. The cluster also never shows in the Clusters list in the Dask tab. It does connect enough to turn the buttons orange, though.

I can create a cluster with the "new" button in the Dask tab, and then right-click to inject the code into the notebook to connect a Client. That seems to have the same problem, as does just running LocalCluster(), so I expect the issue is something with the Docker setup. I again did a clean_slurm before trying this out.

After clicking to connect to the cluster, the process for the IPython kernel starts using CPU heavily, eventually (in Safari) increasing to 100% (in the container).

AdamWlodarczyk commented 3 years ago

I checked it once again. At first I didn't have those issues (Ubuntu 20.04.02, Firefox 85.0 and Chromium 88.0) but then I tried different scenarios and for example if I open a new notebook and create a cluster (e.g. through the distributed.Client class) and indeed there are some problems like below. image

Then I created a cluster from the dask lab widget and here's the interesting one. Firstly, I create a cluster and nothing happens after searching for dashboard url (the magnifier icon). image

But if you look closely you can spot that the dashboard url of cluster is not on 127.0.01:PORT but on some other address (here http://172.19.0.4:8787) and if I input that address and hit enter the widgets are connected and working properly. image

And if I run the first notebook (about decorators) then created cluster shows the correct dashboard url (here http://172.19.0.4:40209/) so it works in my docker properly. image

@dwhswenson I wonder what is the puprose of your problems. I don't believe it's the Safari or that :green_apple:. I'll dig some time and maybe I will find the purpose.

About the widgets menu – I checked the dask-examples binder with dask lab and there is no custom widget's menu in it. It has same menu (maybe depending on browser) like below: image

In chromium that menu looked a little bit different but also had the reload frame option. And I think this is an expected behavior.

ocaisa commented 3 years ago

Right now, we only pass through the default Dask port to the host (since that is the only one we can know in advance). In your browser that is the only Dask scheduler port that will work using localhost (the port it is mapped to is reported in the output of start_tutorial). All the other ports that get used (which happens when you have multiple clusters running at once) will only be visible on the internal docker network (172.19.0.* in the case of @AdamWlodarczyk). Your localhost should be able to see that network though, which is why the magnifying glass search still works. Also clicking on the URL that client spits out will also work, and you can paste that URL into the search bar of the Dask widget, and it will also find it.

dwhswenson commented 3 years ago

I installed dask-labextension in a local environment. It worked there. I got something that I thought was making it work in the container, but it turns out was actually connecting to my the cluster I'd left over in my local Jupyter lab! (Joys of Javascript, I guess? Not entirely sure how it got out of the container).

I tried redirecting the stderr from Jupyter to a file. When I repeated the error, I didn't see any relevant messages. I took a look at the console in Chrome. There were a couple of errors coming from dask-jobextension on startup:

image

I don't get these when I run locally. I'm not sure if this can help debug the issue or not. No other errors occurred while running. The link at the 404 gives me a file with the following contents:

{"message": "Schema not found: /opt/anaconda/share/jupyter/lab/schemas/dask-labextension/plugin.json", "reason": null}

Given the 404, is it possible that I messed something up when I refactored the install in #102? Have you used a fresh install?

About the widgets menu – I checked the dask-examples binder with dask lab and there is no custom widget's menu in it. It has same menu (maybe depending on browser) like below:

Sorry, yes, I think it is the browser's menu. When it's just the Dask logo background, there's no frame to reload, so I didn't see that option then until I connected to the cluster.

MilBia commented 3 years ago

I have same error log in a browser on the start of Jupyter and everything work. So it shouldn't be that. As far I know, #102 shouldn't results your problems. Maybe a source of those is a docker(-compose). Please check your version. It works for both my configurations. Docker 19.03.6 with docker-compose 1.17.1 and docker 19.03.11 with docker-compose 1.25.5

dwhswenson commented 3 years ago

These are what currently come with Docker Desktop for Mac. Anyone else on Docker 20.x? @ocaisa @AdamWlodarczyk ? I can try to downgrade, but if our setup doesn't work with recent Docker releases, we'll definitely need to fix it.

AdamWlodarczyk commented 3 years ago

@dwhswenson I tested it from scratch (i.e. I removed related images and dockers) and it worked with:

If this is not working on Docker 20.x then this is critical...

MilBia commented 3 years ago

I update for newest versions I can on my distribution.

Everything still works for me. I'm out of ideas for now. I suspected that was related to communication error I once acure in docker, but it's not a case here I think.

@dwhswenson Can you dump to file output of start_tutorial and share with us?

dwhswenson commented 3 years ago

Downgraded to:

Still having the same problems. (Those are still a little more recent that yours -- wasn't paying close enough attention when setting up -- but it's close.)

A little more detail, related to @ocaisa's comment above. Here's what doesn't work:

So for some reason, I can not access the container's network from my browser.

Did some digging before posting. I guess it's supposed to be a feature, not a bug? So @AdamWlodarczyk, maybe it is that 🍏!

I still don't understand why it can't connect to the dashboard when I manually enter the port-fowarded version, though.

ocaisa commented 3 years ago

Looks like adding the -P option to the login docker might fix this, not sure what the impact would be for non-macOS though

ocaisa commented 3 years ago

And that doesn't work with docker-compose :(

https://forums.docker.com/t/how-can-i-expose-all-ports-in-docker-compose-yml-instead-of-one-port/81493

MilBia commented 3 years ago

I think we can't do much more from docker site here. I suspect that mapping container port 8787 to host port 8787 and connecting disk-labextension with localhost:8787. Please say if I'm wrong. So for now for Mac we can change

ports:
- "8888"
- "8787"

to:

ports:
- "8888"
- "8787:8787"

in docker-compose.yml And maybe we'll be able to solve that from disk-labextension site if it's possible.

dwhswenson commented 3 years ago

(oops, I wrote this up and left it in preview mode -- just came back to this thread and saw it hadn't posted)

So here's the full problem:

I think the best solution here is for people who can't use the widgets to just use the link to the daskboard that the start_tutorial script outputs. We'll just have to warn them. I'm actually a little surprised we didn't hear anything at the last tutorial.

However, @ocaisa : You will need to be sure to close each cluster before opening a new one. Either that, or explicitly assign ports for each cluster, and forward those parts. Right now, the first notebook has 2 clusters and the first one doesn't get closed.

EDIT: @MilBia's suggestion should work, right? Still need to update the tutorial to close the clusters, but if you do that and change the discovered URL to 127.0.0.1 instead of the 172.*, then (1) the Python running in the container should see a dashboard at 127.0.0.1:8787 and (2) the browser running locally should that forwarded to its 127.0.0.1:8787.

ocaisa commented 3 years ago

Actually I am doing that now in https://github.com/E-CAM/jobqueue_features_workshop_materials/pull/5

dwhswenson commented 3 years ago

Thanks @MilBia !

image

Note: The magnifying glass will give the internal IP address. You have to manually set it to 127.0.0.1:8787.

PR coming!