dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

Issue accessing AWS EMR Dask cluster Bokeh dashboard via Chrome #80

Closed johnwallx closed 5 years ago

johnwallx commented 5 years ago

I am trying to access the Dask Bokeh dashboard on an AWS EMR cluster via Chrome, but nothing is shown when I click on the linked dashboard.

I set-up the cluster using this workflow. When I click on the linked dashboard, I see nothing unless I click on "Info" where I can only find information on the workers.

I have tried using ! pip install tornado==5 from within Jupyter Notebook/Hub with no resolution.

What am I missing to see the entire Dask Bokeh dashboard?

Note: This is a requested x-post from StackOverflow 1

manugarri commented 5 years ago

Same here, fwiw, here are the error logs in the browser:

[bokeh] setting log level to: 'info'
bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31 WebSocket connection to 'ws://localhost:8889/proxy/32881/graph/ws?bokeh-protocol-version=1.0&bokeh-session-id=1b7NCuASbA2O02Kobpq3PVSg93xYEGFnGon7ZOofO3Wh' failed: Error during WebSocket handshake: Unexpected response code: 200
t.connect @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.pull_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.add_document_from_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
f @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31 [bokeh] Failed to connect to Bokeh server Error: Could not open websocket
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
Promise.then (async)
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.pull_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.add_document_from_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
f @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
requestAnimationFrame (async)
i.defer @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.embed_items @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
embed_document @ graph:96
(anonymous) @ graph:100
(anonymous) @ graph:115
i.safely @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:33
fn @ graph:90
bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31 [bokeh] Lost websocket 0 connection, 1006 ()
bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31 Uncaught (in promise) Error: Could not open websocket
    at t._on_error (bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31)
    at WebSocket.t.socket.onerror (bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31)
t._on_error @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
t.socket.onerror @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
Promise.then (async)
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.pull_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.add_document_from_session @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
f @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
(anonymous) @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
requestAnimationFrame (async)
i.defer @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
i.embed_items @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31
embed_document @ graph:96
(anonymous) @ graph:100
(anonymous) @ graph:115
i.safely @ bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:33
fn @ graph:90
bokeh.min.js?v=1bfbafacfa847bc6589a4af73a904fef:31 [bokeh] Websocket connection 0 disconnected, will not attempt to reconnect
quasiben commented 5 years ago

Can you see the YARN UI ?

I wonder websockets are being blocked ?

matthieubulte commented 5 years ago

I am trying to access the Dask Bokeh dashboard on an AWS EMR cluster via Chrome, but nothing is shown when I click on the linked dashboard.

Yes, that's normal. Yarn does block any websocket and brokeh currently relies on websockets for updating the dashboard's views.

I also use dask on yarn/emr and my solution is to simply ssh into the scheduler with port forwarding on the dashboard ip 8787.

manugarri commented 5 years ago

@matthieubulte would you care to share the ssh command?

jcrist commented 5 years ago

Yarn does block any websocket and brokeh currently relies on websockets for updating the dashboard's views.

While this is true, the emr bootstrap script sets up jupyter-server-proxy, which should proxy through the dashboard (both http and websockets). We are explicitly not using the yarn proxy. From the console logs above you are clearly getting access to the server, but websockets are not being proxied properly. In this case, I suspect this may be a bug in jupyter-server-proxy - I know it has issues with tornado 6, but in this case you say you are using tornado 5 so I'm not sure what's going on.

manugarri commented 5 years ago

@jcrist i am using the tornado version tornado 6.0.3 py36h7b6447c_0

Afaik the only mention of proxy in the bootstrap script is when we write the yaml file.


distributed:
  dashboard:
    link: "/proxy/{port}/status"```

Is it possible this doesnt work when deploying a yarncluster in deploy_mode="local"
jcrist commented 5 years ago

The configuration is the only thing needed, and it only works with deploy-mode local. Try downgrading to tornado 5, I suspect that will fix it.

jcrist commented 5 years ago

@johnwallx I just noticed you say

I have tried using ! pip install tornado==5 from within Jupyter Notebook/Hub with no resolution.

This won't update the version of tornado used for jupyter-server-proxy, as the server proxy is already running. You'll need to update the bootstrap script you're using to pin tornado to version 5. As I said above, I suspect that this is a tornado 6 issue, see https://github.com/jupyterhub/jupyter-server-proxy/issues/109.

bschreck commented 5 years ago

I'm seeing the same error, as well as not being able to connect to the python kernel via Jupyter in the browser (verified it works in ipython via ssh).

Chrome console shows this looping over and over again:

default.js:64 WebSocket connection to 'ws://IP_ADDR:9444/api/kernels/e9aef8c8-b390-4299-8818-2f0b13367a87/channels?session_id=6f3fb9f5-bc46-45a5-aaa4-f572af26503f' failed: Error in connection establishment: net::ERR_CONNECTION_TIMED_OUT
DefaultKernel._createSocket @ default.js:64
default.js:144 Connection lost, reconnecting in 4 seconds.
DefaultKernel._onWSClose @ default.js:144
error (async)
DefaultKernel._createSocket @ default.js:70
setTimeout (async)
DefaultKernel._onWSClose @ default.js:145
error (async)
DefaultKernel._createSocket @ default.js:70
setTimeout (async)
DefaultKernel._onWSClose @ default.js:145
error (async)
DefaultKernel._createSocket @ default.js:70
DefaultKernel @ default.js:191
clone @ default.js:319
connectTo @ default.js:1348
connectTo @ default.js:1182
connectTo @ kernel.js:94
setupKernel @ default.js:289
DefaultSession @ default.js:52
clone @ default.js:163
connectTo @ default.js:457
connectTo @ default.js:399
connectTo @ session.js:115
connectTo @ manager.js:209
(anonymous) @ index.js:138
invokeSlot @ index.js:475
emit @ index.js:433
push.qUp9.Signal.emit @ index.js:106
_onStarted @ manager.js:320
startNew @ manager.js:167
async function (async)
startNew @ manager.js:166
_startSession @ clientsession.js:414
_changeKernel @ clientsession.js:366
_startIfNecessary @ clientsession.js:348
initialize @ clientsession.js:321
async function (async)
initialize @ clientsession.js:307
(anonymous) @ context.js:389
Promise.then (async)
_populate @ context.js:379
(anonymous) @ context.js:502
Promise.then (async)
_revert @ context.js:472
initialize @ context.js:188
(anonymous) @ manager.js:434
Promise.then (async)
_createOrOpenDocument @ manager.js:434
open @ manager.js:264
openOrReveal @ manager.js:288
(anonymous) @ index.js:227
Promise.then (async)
execute @ index.js:227
push.5TpB.CommandRegistry.execute @ index.js:351
(anonymous) @ index.js:450
Promise.then (async)
createNew @ index.js:449
execute @ index.js:475
push.5TpB.CommandRegistry.execute @ index.js:351
onclick @ index.js:189
callCallback @ react-dom.development.js:149
invokeGuardedCallbackDev @ react-dom.development.js:199
invokeGuardedCallback @ react-dom.development.js:256
invokeGuardedCallbackAndCatchFirstError @ react-dom.development.js:270
executeDispatch @ react-dom.development.js:561
executeDispatchesInOrder @ react-dom.development.js:583
executeDispatchesAndRelease @ react-dom.development.js:680
executeDispatchesAndReleaseTopLevel @ react-dom.development.js:688
forEachAccumulated @ react-dom.development.js:662
runEventsInBatch @ react-dom.development.js:816
runExtractedEventsInBatch @ react-dom.development.js:824
handleTopLevel @ react-dom.development.js:4826
batchedUpdates$1 @ react-dom.development.js:20439
batchedUpdates @ react-dom.development.js:2151
dispatchEvent @ react-dom.development.js:4905
(anonymous) @ react-dom.development.js:20490
unstable_runWithPriority @ scheduler.development.js:255
interactiveUpdates$1 @ react-dom.development.js:20489
interactiveUpdates @ react-dom.development.js:2170
dispatchInteractiveEvent @ react-dom.development.js:4882
default.js:56 Starting WebSocket: ws://IP_ADDR:9444/api/kernels/e9aef8c8-b390-4299-8818-2f0b13367a87

I modified the bootstrap script to install old versions like this:

conda install \
-c defaults \
-c conda-forge \
-y \
-q \
python=$PYTHON_VERSION \
dask-yarn==0.7.0 \
distributed==2.2.0 \
tornado==5.1.1 \
jupyter-server-proxy \
pyarrow \
s3fs \
nomkl \
conda-pack \
pandas \
matplotlib \
seaborn \
$EXTRA_CONDA_PACKAGES

This is on a fresh cluster, so unless there is some other tornado that jupyter-server-proxy is using, I have tornado 5.1.1, confirmed in IPython over ssh.

Dask cluster is able to start up and execute fine in IPython. Jupyter kernel process is running in the background.

I tried both jupyter notebook and lab.

bschreck commented 5 years ago

My EMR proxy is clearly working over HTTP (using the ssh -ND 8157 option recommended by amazon), since I can access jupyter notebook/lab at all, as well as the hadoop resource manager UI

jcrist commented 5 years ago

not being able to connect to the python kernel via Jupyter in the browser (verified it works in ipython via ssh).

When you say you can't connect to the jupyter kernel in the browser, what do you mean? Can you access the notebook server? If you click "new kernel" does that fail?

Are you sure you're ssh-forwarding the correct address/port for the server? When I wrote these docs, the following steps worked perfectly:

https://yarn.dask.org/en/latest/aws-emr.html#connect-to-the-emr-cluster

In particular, have you tried port-forwarding the notebook server, which should be running on port 8888?

$ ssh -i ~/mykeypair.pem -L 8888:<public-dns-name>:8888 hadoop@<public-dns-name>

The websocket proxying issue with jupyter-server-proxy on tornado 6 should not affect the normal notebook server from working (jupyter-server-proxy wouldn't be hit at all when just trying to access the notebook server). I suspect you've run into a separate unrelated issue, likely with your configuration.

It's not easy for me to set up an EMR cluster to reproduce, but I can if needed. Having more detailed information about what you're doing when it errors, what the error is, how you're accessing the server (ssh tunnel, open port(s), etc...), and any relevant screen-shots would certainly be helpful here.

bschreck commented 5 years ago

First of all, thanks for being so responsive and bearing with me.

I will try to explicitly use port forwarding instead of the proxy that AWS recommends, although I believe they accomplish the same thing, right?

It seems like that wouldn’t be the issue, as I explicitly run Jupyter on port 9444, and I’m able to connect to the REST interface just fine on that port. Would it be possible that the proxy (using foxy proxy per the EMR instructions, and ssh -i pemfile.pem -ND 8157 hadoop@ip) only works with HTTP, not websockets?

Clicking on new kernel fails. I only ever see “waiting for kernel”, and the websocket errors in the chrome console.

Will reply again when I test with -L and no proxy.

bschreck commented 5 years ago

Interesting- so enabling explicit port forwarding via -L worked, but the dynamic port forwarding suggested by AWS via -D did not. I'm not sure why, but I'll just use -L in the future. Thanks for the help!

jcrist commented 5 years ago

Hmmm, some proxies don't properly handle websockets, it sounds like the option you specified was one of those. If there is something we could improve in our documentation, please feel free to submit a PR.

jcrist commented 5 years ago

I'm going to close this issue. The answer right now for any dashboard + jupyter-server-proxy issue is that you need tornado 5, not tornado 6. The bootstrap script example has been updated to show this.

If anyone continues to have issues, feel free to open a new issue.