dask / dask-labextension

JupyterLab extension for Dask
BSD 3-Clause "New" or "Revised" License
312 stars 63 forks source link

Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

Open jerrygb opened 1 year ago

jerrygb commented 1 year ago

Describe the issue:

I am able to create clusters, connect using dask clients and perform Dask operations without issues using KubeCluster Operator on a Notebook. I am also able to connect to the status dashboard using port-forwarding to the scheduler.

However, I am not able to connect to these clusters when using the lab extensions. When I try to move to an active notebook and click search on the Dask Lab-extension, it does picks up a remote cluster address. The Dashboard URLs that are picked up the extension code look like,

http://internal-scheduler.namespace:8787/

But, I think the extension is not able to connect to it. I do not see any logs pertaining to this action.

Do these dashboards need to be external (meaning are these connections made from browser or backend service)? Since I was not sure about this, I tried setting up AWS NLB. I tried connecting to the NLB address using the Client as seen in the second snippet below.

Minimal Complete Verifiable Example:

All of the following code snippets work fine from the notebook.

# Create a cluster
from dask_kubernetes.operator import make_cluster_spec, make_worker_spec
from dask_kubernetes.operator import KubeCluster
from dask.distributed import Client
import dask.dataframe as dd
import os
profile_name = namespace_name

custom_spec = make_cluster_spec(name=profile_name, image='ghcr.io/dask/dask:latest', resources={"requests": {"memory": "512Mi"}, "limits": {"cpu": "4","memory": "8Gi"}})

custom_spec['spec']['scheduler']['spec']['serviceAccount'] = 'default-editor'
custom_spec['spec']['worker']['spec']['serviceAccount'] = 'default-editor'

custom_worker_spec = make_worker_spec(image='ghcr.io/dask/dask:latest', n_workers=6, resources={"requests": {"memory": "512Mi"}, "limits": {"memory": "12Gi"}})
custom_worker_spec['spec']['serviceAccount'] = 'default-editor'
custom_worker_spec
cluster = KubeCluster(custom_cluster_spec=custom_spec, n_workers=0)
cluster.add_worker_group(name='highmem', custom_spec=custom_worker_spec)

As mentioned, let's assume that I have AWS NLB type LoadBalancer/Ingress Service. Then the Dask Client is able to successfully interact against 8787 and 8786 ports on the scheduler in order to manage the workers and jobs, externally.

# Connect to external endpoint works fine
import dask; from dask.distributed import Client
dask.config.set({'scheduler-address': 'tcp://nlb-address.region.elb.amazonaws.com:8786'})
client = Client()

Anything else we need to know?:

Another thing noticed was that the dask-extension relies on testDaskDashboard function to pick up the URL info (defined in https://github.com/dask/dask-labextension/blob/main/src/dashboard.tsx#L588),

In the console, I can see,

Found the dashboard link at 'http://internal-scheduler.namespace:8787/status'

However, the consequent dashboard-check request to the backend is replacing an extra / from protocol.

See the GET request below,

GET https://website/notebook/internal/test-dask-1/dask/dashboard-check/http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491

To be a bit more verbose,

http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491 translates to http:/internal-scheduler.namespace:8787/?1673363416491

I am not sure if this is expected or a bug.

Environment:

thedeg123 commented 9 months ago

I'm having this same issue. I can connect to the dashboard fine from the notebook, but not from the lab extension. Looking at the Network request failure I see ERR_NAME_NOT_RESOLVED for a GET request to my-dask-scheduler.namespace/cluster-map. Did you ever solve this issue?

jacobtomlinson commented 6 months ago

You will need to configure the dashboard address to use the Jupyter proxy, this varies between setups so it's hard for us to set a sane default for.

If you create the cluster with KubeCluster then the dashboard port will be proxied to the node where Jupyter is running. You likely need to set DASK_DISTRIBUTED__DASHBOARD__LINK="proxy/{host}:{port}/status" for this to work correctly, but this will vary depending on how you've configured your Jupyter environment.