Closed ericmjl closed 6 years ago
Are you, by any chance, setting your network interface when using dask-jobqueue? If so can you try not doing that and see if there is an effect?
On Fri, Jun 15, 2018 at 5:11 PM, Eric Ma notifications@github.com wrote:
In today's experiments with Dask + dask-jobqueue, I found that I could not load the Bokeh dashboard that @mrocklin https://github.com/mrocklin keeps showing me 😄, which has kept me wondering what exactly is the progress on my simple, embarrassingly parallel task of "loading ~900+ matlab .mat matrices into memory".
The URL provided by the client is: http://172.16.23.102:8787/status.
I'm able to ping the IP address in there:
$ ping 172.16.23.102
However, I'm unable to access the page in my browser.
In terms of network settings, I'm on my work VPN.
Is there something that's blocking access that I'm missing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-jobqueue/issues/71, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJE7-6b-olJ9P9mJCWtgCqA3b0J_ks5t9CMDgaJpZM4UqJ7- .
Originally, I did the following:
cluster = SGECluster(queue='default.q', walltime="1500000", processes=10, memory='20GB')
This gave the client address:
http://172.16.23.102:8787/status
After setting the interface to eth0
:
cluster = SGECluster(queue='default.q', walltime="1500000", processes=10, memory='20GB', interface='eth0')
the client address is:
http://172.16.23.102:43906/status
This was not accessible (also timed-out).
I also tried setting the cluster interface to ib0
. The client address then became:
http://10.145.71.204:8787/status
This also timed-out.
Setting the eth1
or ib1
also gave errors (likely because these interfaces are not 'enabled' somehow).
Is there something that's blocking access that I'm missing?
Maybe. It might be worth checking with your IT staff.
You might consider seeing if machines on the SGE cluster can see that address, perhaps by using requests
to download that page, both from the client machine, and possibly from one of the workers:
import requests
requests.get(addr) # does this work?
client.run(requests.get, addr) # does this work?
@mrocklin thanks for the help! I tried the following:
import requests
addr = "http://cluster.server.ip.addr:port/status" # with appropriate modifications.
requests.get(addr) # times out
In the absence of any information from IT, my current hypothesis is that there is something blocking the opening of ports from node to node. I'll continue to keep tabs on this issue.
The way I have got a similar setup to work in the past is to do ssh tunneling:
ssh -fN your-login@scheduler-machine -L 33023:localhost:33023
Then I can just open http://localhost:33023 on my local machine and see the status page. Slightly cumbersome but I am afraid I don't know of a more convenient way of doing it. This approach (which is not dask-specific at all) is mentioned in the distributed doc.
@lesteve thank you for the excellent tip! I have added this to my TextExpander snippets :smile:
@ericmjl - would you be interested in adding a note in the dask-jobqueue documentation identifying this issue? I suspect other users will run into the port-forwarding issue.
you may also be interested in this PR: https://github.com/pangeo-data/pangeo/pull/317
We also do the same thing as @lesteve has suggested.
At some point we may consider revitalizing the JupyterLab extension for Dask (especially as JupyterLab has become more stable), which would make adding the extra ssh tunnel unnecessary. AIUI JupyterLab developers are eager to have users that can give feedback on this sort of thing. So getting help shouldn't be too hard if someone has cycles/interest to pursue this.
I don't necessarily encourage people to build off of the old extension. The right thing to do here is probably to get input from JLab people first.
On Wed, Jun 20, 2018 at 4:20 PM, jakirkham notifications@github.com wrote:
We also do the same thing as @lesteve https://github.com/lesteve has suggested.
At some point we may consider revitalizing the JupyterLab extension for Dask (especially as JupyterLab has become more stable), which would make adding the extra ssh tunnel unnecessary. AIUI JupyterLab developers are eager to have users that can give feedback on this sort of thing. So getting help shouldn't be too hard if someone has cycles/interest to pursue this.
ref: https://github.com/dask/dask-labextension
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-jobqueue/issues/71#issuecomment-398883429, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNHyrWHuyyXdxOk4CGrg0C0tQ6aiks5t-q6IgaJpZM4UqJ7- .
Issue ( https://github.com/dask/dask-labextension/issues/15 ) had some discussion about how this might be done. Not sure if that advice is still current.
@jhamman happy to do so. Give me a few hours (busy till later) and I'll be happy to PR in.
In today's experiments with Dask + dask-jobqueue, I found that I could not load the Bokeh dashboard that @mrocklin keeps showing me :smile:, which has kept me wondering what exactly is the progress on my simple, embarrassingly parallel task of "loading ~900+ matlab
.mat
matrices into memory".The URL provided by the client is:
http://172.16.23.102:8787/status
.I'm able to ping the IP address in there:
However, I'm unable to access the page in my browser; I get a timeout error.
In terms of network settings, I'm on my work VPN.
Is there something that's blocking access that I'm missing?