dask / dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
https://jobqueue.dask.org
BSD 3-Clause "New" or "Revised" License
234 stars 142 forks source link

Unable to view the status dashboard #71

Closed ericmjl closed 6 years ago

ericmjl commented 6 years ago

In today's experiments with Dask + dask-jobqueue, I found that I could not load the Bokeh dashboard that @mrocklin keeps showing me :smile:, which has kept me wondering what exactly is the progress on my simple, embarrassingly parallel task of "loading ~900+ matlab .mat matrices into memory".

The URL provided by the client is: http://172.16.23.102:8787/status.

I'm able to ping the IP address in there:

$ ping 172.16.23.102

However, I'm unable to access the page in my browser; I get a timeout error.

In terms of network settings, I'm on my work VPN.

Is there something that's blocking access that I'm missing?

mrocklin commented 6 years ago

Are you, by any chance, setting your network interface when using dask-jobqueue? If so can you try not doing that and see if there is an effect?

On Fri, Jun 15, 2018 at 5:11 PM, Eric Ma notifications@github.com wrote:

In today's experiments with Dask + dask-jobqueue, I found that I could not load the Bokeh dashboard that @mrocklin https://github.com/mrocklin keeps showing me 😄, which has kept me wondering what exactly is the progress on my simple, embarrassingly parallel task of "loading ~900+ matlab .mat matrices into memory".

The URL provided by the client is: http://172.16.23.102:8787/status.

I'm able to ping the IP address in there:

$ ping 172.16.23.102

However, I'm unable to access the page in my browser.

In terms of network settings, I'm on my work VPN.

Is there something that's blocking access that I'm missing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-jobqueue/issues/71, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJE7-6b-olJ9P9mJCWtgCqA3b0J_ks5t9CMDgaJpZM4UqJ7- .

ericmjl commented 6 years ago

Originally, I did the following:

cluster = SGECluster(queue='default.q', walltime="1500000", processes=10, memory='20GB')

This gave the client address:

http://172.16.23.102:8787/status

After setting the interface to eth0:

cluster = SGECluster(queue='default.q', walltime="1500000", processes=10, memory='20GB', interface='eth0')

the client address is:

http://172.16.23.102:43906/status

This was not accessible (also timed-out).

I also tried setting the cluster interface to ib0. The client address then became:

http://10.145.71.204:8787/status

This also timed-out.

Setting the eth1 or ib1 also gave errors (likely because these interfaces are not 'enabled' somehow).

mrocklin commented 6 years ago

Is there something that's blocking access that I'm missing?

Maybe. It might be worth checking with your IT staff.

You might consider seeing if machines on the SGE cluster can see that address, perhaps by using requests to download that page, both from the client machine, and possibly from one of the workers:

import requests
requests.get(addr)  # does this work?

client.run(requests.get, addr)  # does this work?
ericmjl commented 6 years ago

@mrocklin thanks for the help! I tried the following:

import requests
addr = "http://cluster.server.ip.addr:port/status"  # with appropriate modifications.
requests.get(addr)  # times out

In the absence of any information from IT, my current hypothesis is that there is something blocking the opening of ports from node to node. I'll continue to keep tabs on this issue.

lesteve commented 6 years ago

The way I have got a similar setup to work in the past is to do ssh tunneling:

ssh -fN your-login@scheduler-machine -L 33023:localhost:33023

Then I can just open http://localhost:33023 on my local machine and see the status page. Slightly cumbersome but I am afraid I don't know of a more convenient way of doing it. This approach (which is not dask-specific at all) is mentioned in the distributed doc.

ericmjl commented 6 years ago

@lesteve thank you for the excellent tip! I have added this to my TextExpander snippets :smile:

jhamman commented 6 years ago

@ericmjl - would you be interested in adding a note in the dask-jobqueue documentation identifying this issue? I suspect other users will run into the port-forwarding issue.

jhamman commented 6 years ago

you may also be interested in this PR: https://github.com/pangeo-data/pangeo/pull/317

jakirkham commented 6 years ago

We also do the same thing as @lesteve has suggested.

At some point we may consider revitalizing the JupyterLab extension for Dask (especially as JupyterLab has become more stable), which would make adding the extra ssh tunnel unnecessary. AIUI JupyterLab developers are eager to have users that can give feedback on this sort of thing. So getting help shouldn't be too hard if someone has cycles/interest to pursue this.

ref: https://github.com/dask/dask-labextension

mrocklin commented 6 years ago

I don't necessarily encourage people to build off of the old extension. The right thing to do here is probably to get input from JLab people first.

On Wed, Jun 20, 2018 at 4:20 PM, jakirkham notifications@github.com wrote:

We also do the same thing as @lesteve https://github.com/lesteve has suggested.

At some point we may consider revitalizing the JupyterLab extension for Dask (especially as JupyterLab has become more stable), which would make adding the extra ssh tunnel unnecessary. AIUI JupyterLab developers are eager to have users that can give feedback on this sort of thing. So getting help shouldn't be too hard if someone has cycles/interest to pursue this.

ref: https://github.com/dask/dask-labextension

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-jobqueue/issues/71#issuecomment-398883429, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNHyrWHuyyXdxOk4CGrg0C0tQ6aiks5t-q6IgaJpZM4UqJ7- .

jakirkham commented 6 years ago

Issue ( https://github.com/dask/dask-labextension/issues/15 ) had some discussion about how this might be done. Not sure if that advice is still current.

ericmjl commented 6 years ago

@jhamman happy to do so. Give me a few hours (busy till later) and I'll be happy to PR in.