dask / dask

Parallel computing with task scheduling
https://dask.org
BSD 3-Clause "New" or "Revised" License
12.36k stars 1.69k forks source link

Proxying Dask (Bokeh) Web Interface on AWS SageMaker #5432

Open davidtwomey opened 4 years ago

davidtwomey commented 4 years ago

Hi, I am using Amazon SageMaker Instances to run self-contained JupyterLab environments. I can run dask no problem, but it would be great to view the bokeh dashboard in my browser. I have so far looked into two approaches (see below) but would welcome any solutions/ideas anyone has:

1. Port Forwarding: (as suggested here)

Unfortunately, (to the best of my knowledge), AWS Sagemaker Notebooks do not allow SSH access, and hence Port Forwarding is not an option

NOTE: Google AI Notebook Instances do however and so this is a suitable solution for GCP <VM-IP-ADDRESS>:8787/status

2. Using a jupyter proxy extension google-server-proxy

This is close to a working solution, as I am able to view the dashboard, but am getting a websocket error When I attempt to access the dashboard using the proxy bokeh.protocol.exceptions.ProtocolError: No bokeh-protocol-version specified image The exception raised from bokeh image

Steps to reproduce

Additional Comment I am a big advocate of dask and very much appreciate the ongoing work everyone is doing!

I understand this may not be the appropriate repo for this issue/feature-request and may not appear of high priority. However, I feel SageMaker Notebooks/GCP AI Notebook environments will be an increasing use case for lots of ML researchers and developers and, consequently, a well supported solution to this would be a useful addition to the docs.

Thanks,

David

mrocklin commented 4 years ago

Thanks for laying all of this out @davidtwomey .

I understand this may not be the appropriate repo for this issue/feature-request and may not appear of high priority

My first question is actually whether or not the AWS Sagemaker folks can help with this problem.

@wleepang do you have any contacts that would be useful here?

wleepang commented 4 years ago

@mrocklin Sorry, just seeing this. I can ask around if this is still an issue.

jennakwon06 commented 4 years ago

This is still an issue for me. I'm hitting the exact same error (No bokeh-protocol specified).

I've tried with my SM notebook instance being in a VPC / no VPC. Neither works.

Direct Internet is enabled on the SM notebook instance.

nima-akram commented 4 years ago

I am also hitting this exact same issue.

jacobtomlinson commented 4 years ago

I'm reproducing this now while investigating dask/dask-labextension#87.

It does seem that Sagemaker is dropping the websocket connection

image

Pinging @wleepang again to see if this is something you could help with?

jacobtomlinson commented 4 years ago

Proxying the dashboard out to the internet with serveo shows the dashboard is working correctly.

image

Therefore the websocket must be being dropped somewhere, either by the nbserverproxy or whatever proxy Sagemaker uses to expose Jupyter.

wleepang commented 4 years ago

I've managed to get this far with Sagemaker: image

The errors don't seem to be specific to either dask or bokeh:

tornado.application - ERROR - Uncaught exception in /status/ws
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/websocket.py", line 498, in _run_callback
    result = callback(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/bokeh/server/views/ws.py", line 121, in open
    if self.selected_subprotocol != 'bokeh':
AttributeError: 'WSHandler' object has no attribute 'selected_subprotocol

Updating Tornado to 6.0.2 and restarting the kernel yields the following:

tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='10.0.111.225:8443', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/websocket.py", line 956, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/bokeh/server/views/ws.py", line 123, in open
    raise ProtocolError("Subprotocol header is not 'bokeh'")
bokeh.protocol.exceptions.ProtocolError: Subprotocol header is not 'bokeh'

Dask: 2.16.0 Bokeh: 2.0.2

This was with a Sagemaker Notebook instance in its own VPC with a public and private subnet. The notebook instance was launched in the private subnet with a security group that allows all traffic self-ingress.

Note, it does not work if you use the "Default" VPC for the Notebook instance's network configuration.

jacobtomlinson commented 4 years ago

Thanks for looking at this @wleepang!

Note, it does not work if you use the "Default" VPC for the Notebook instance's network configuration.

I suspect many of our users are going to be using the Default VPC. Do you know why this doesn't work?

wleepang commented 4 years ago

@jacobtomlinson -

That was from my initial testing. I've since got it to the state above with one of my Default VPCs by doing the following:

Again, launch the notebook instance into the Private subnet. One extra detail I didn't mention previously, I disabled internet access via SageMaker in the notebook networking config. This makes it so that access to the internet is provided via the VPC.

ghost commented 4 years ago

Is there any update on this issue? I am having the same problems.

config dir: /home/ec2-user/.jupyter jupyterlab_git enabled

bokeh==2.0.1 dask==2.19.0 dask-labextension==2.0.2 jupyter-server-proxy==1.5.0 jupyterlab==2.1.4 jupyterlab-server==1.1.0

image

mrocklin commented 4 years ago

Is there any update on this issue?

As you can see above the last update was 28 days ago. There are no other side channels for conversation here other than github.

I'm not sure how much the Dask maintainers can do to help here. I think that this probably requires some engagement from AWS Sagemaker folks. Perhaps someone here has a support contract that they can use to engage AWS on this problem?

ghost commented 4 years ago

@mrocklin I will communicate this to the AWS reps at our company and comment back if I hear anything. Thanks!

ghost commented 4 years ago

I have contacted our AWS support team and they are investigating this issue with a fix to come, hopefully. The issue on AWS is 7137875451 for reference. Will update as I hear more.

mrocklin commented 4 years ago

@blink1073 any chance you all can help here?

blink1073 commented 4 years ago

Looking through the thread I think @jewelltp's ticket is the best bet. I'm not plugged in to the internal platform aspects, focusing on the JupyterLab 3.0 release.

ghost commented 4 years ago

Just an update - the AWS engineer has identified the issue and seems to agree that this needs to be fixed. Looks like we will have a resolution somewhat soon. @mrocklin @blink1073 any other AWS issues to report as I have their attention?

Response from AWS:

"Hi Tyler,

This is to update you that I am still waiting to hear from Service team and will get back to you as soon as I have further information.

As notified earlier I have replicated the issue in my account and now comparing the functionality of "TensorBoard" which works fine using the same proxy mechanism."

davidtwomey commented 4 years ago

Posting my solution (inspired by: https://modelpredict.com/sagemaker-ssh-setup/)

NOTES

  • tested on windows only
  • requires signup to 3rd-party https://ngrok.com/. (Also possible via a bastion host as explained in the link above)

Any problems/errors let me know and i'll update! Regards,

David

Steps

0. Register for a free account at https://ngrok.com/ and get your authentication token

1. (On Sagemaker) Setup Ngrok

# Download latest ngrok
curl https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip > ngrok.zip
# Unzip
unzip ngrok.zip
# Add your ngrok authentication key (found here -> https://dashboard.ngrok.com/get-started/setup)
./ngrok authtoken <ADD-YOUR-AUTH-KEY-HERE>

2. (On local machine) Find and copy your local machine's public ssh key.

In windows this is found in ~/.ssh/id_rsa.pub. Copy the entire string. It should look something like:

ssh-rsa AAAAB3NzaC1yc2....
.....
....= david@MY-LAPTOP-NAME

3. (On Sagemaker) Add this SSH key to ~/.ssh/authorized_keys. Use your favourite terminal editor for this (e.g. vim/nano)

4. (On Sagemaker) Run the ngrok TCP service

./ngrok tcp 22

If successful, you should see something like this image

Grab the host (red) and port (green).

5. (On Sagemaker) Create a notebook, start your dask client and grab the dashboard port (default=8787)

image

6. (On local machine) Connect via SSH Tunnel

Using the host and port obtained in step 4. and the dashboard port in step 5.

ssh -p <PORT> ec2-user@<HOST> -L <DASHBOARD-PORT>:localhost:8787
# e.g. ssh -p 13171 ec2-user@0.tcp.ngrok.io -L 8787:localhost:8787

7. Access the dask dashboard now available in your localhost

image

jacobtomlinson commented 4 years ago

Thanks @davidtwomey for sharing this workaround.

Would be great to not have to do this though. Hopefully AWS will get back to us soon with a full solution.

ghost commented 4 years ago

Here is the latest response from AWS. Looks like we will just have to be patient and wait for the permanent fix.

"Hi Tyler,

I have heard back from Service team and they have confirmed that Dask dashboard is not supported natively in SageMaker Notebook Instance at the moment. Service team took my findings into account while trying to make it work within existing SageMaker architecture but were not able to do so.

After my discussion with Service team, they have created and recorded it as a New Feature Request to support Dask dashboard natively. Please note that I am not able to provide implementation timeline for a new feature request. I would like to recommend keeping an eye on the AWS Blogs [1] and "What's New with AWS" page [2].

Service team has also noted down the GitHub issue and will provide an update when the native support for Dask dashboard is available.

I have been following the Github issue and the workaround provided by David (based on Ngrok and ssh tunneling) seems to be the only solution at the moment.

Thanks for your continued patience as we worked through this issue.

Please contact us if you need further help in this regard."

mrocklin commented 4 years ago

Thank you for handling the cross-project communication Tyler

On Mon, Aug 3, 2020 at 6:43 AM Tyler Jewell notifications@github.com wrote:

Here is the latest response from AWS. Looks like we will just have to be patient and wait for the permanent fix.

"Hi Tyler,

I have heard back from Service team and they have confirmed that Dask dashboard is not supported natively in SageMaker Notebook Instance at the moment. Service team took my findings into account while trying to make it work within existing SageMaker architecture but were not able to do so.

After my discussion with Service team, they have created and recorded it as a New Feature Request to support Dask dashboard natively. Please note that I am not able to provide implementation timeline for a new feature request. I would like to recommend keeping an eye on the AWS Blogs [1] and "What's New with AWS" page [2].

Service team has also noted down the GitHub issue and will provide an update when the native support for Dask dashboard is available.

I have been following the Github issue and the workaround provided by David (based on Ngrok and ssh tunneling) seems to be the only solution at the moment.

Thanks for your continued patience as we worked through this issue.

Please contact us if you need further help in this regard."

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/5432#issuecomment-668030307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTDVUD4Q42FZU2ZLEVLR625HXANCNFSM4I2L3FIQ .

jennakwon06 commented 3 years ago

Hello!

Would like to get a status on this issue / if there's a fix for this! Thank you!

jennakwon06 commented 3 years ago

It'd also be great if I can find out who on AWS is in charge of this? I'm at Amazon and can look into the progress of the service team's fix.

jacobtomlinson commented 3 years ago

Thanks for nudging this @jennakwon06. We haven't heard back from AWS for a while and the issue appears to still persist.

There is an outstanding ticket 7137875451 if that's something you're able to look at.

gballardin commented 3 years ago

I also work at Amazon. I found the internal ticket, which is not really moving forward from what I can tell unfortunately. I'll corral a bunch of +1s internally to get momentum and and prioritize this feature request.

rabernat commented 1 year ago

Just found this issue. It's quite disappointing that Sagemaker does not have full support for Dask (including dashboard) because they make a great combination.

riley-brady commented 1 year ago

@gballardin, has there been any movement on this front? Agreeing with @rabernat that it would be incredible to leverage SageMaker resources with diagnostics from the Dask dashboard.

aluhamaa commented 8 months ago

Also interested in getting the dashboard working in Sagemaker.