coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Dashboard unresponsive #77

Closed davidbrochart closed 3 years ago

davidbrochart commented 4 years ago

After installing and logging in, the link to the dashboard times out:

import coiled
cluster = coiled.Cluster(n_workers=10)

from dask.distributed import Client
client = Client(cluster)
print('Dashboard:', client.dashboard_link)
# https://cloud.coiled.io/dashboard/1096/status
jrbourbeau commented 4 years ago

Thanks for raising an issue @davidbrochart! I tried the same thing and client.dashboard_link took me to the Dask dashboard as expected. Did the dashboard link work initially and then time out, or did it never work to begin with?

Also, by default Coiled clusters will automatically shutdown after 20 minutes of inactivity. Is there a chance that 20 minutes elapsed before you clicked on the dashboard link?

davidbrochart commented 4 years ago

Did the dashboard link work initially and then time out, or did it never work to begin with?

It never worked.

Is there a chance that 20 minutes elapsed before you clicked on the dashboard link?

No, I just tried again. I'm happy to try and debug, is there anything you would like me to do?

jrbourbeau commented 4 years ago

Hrm, I checked the logs for the cluster and it looks like the cluster spun up normally and then closed down after 20 minutes of being idle:

distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy 
distributed.scheduler - INFO - Clear task state                                                                                                            
distributed.scheduler - INFO -   Scheduler at:    tls://10.2.13.107:8786                                                                                   
distributed.scheduler - INFO -   dashboard at:                     :8787                                                                                   
...
distributed.scheduler - INFO - Receive client connection: Client-20ec901e-026d-11eb-8b95-81058b3ddfc7                                                      
distributed.core - INFO - Starting established connection                                                                                                  
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.6:33349', name: davidbrochart-1096-worker-6-b670ce, memory: 0, processing: 0>        
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.6:33349                                                                       
distributed.core - INFO - Starting established connection                                                                                                  
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.51:34927', name: davidbrochart-1096-worker-1-52bfea, memory: 0, processing: 0>       
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.51:34927                                                                      
...
distributed.core - INFO - Starting established connection                                                                                                  
distributed.scheduler - INFO - Scheduler closing after being idle for 1200.00 s                                                                            
distributed.scheduler - INFO - Scheduler closing...                                                                                                        
distributed.scheduler - INFO - Scheduler closing all comms                                                                                                 

Were you able to run computations on the cluster? Perhaps the cluster is functioning normally, but there's something wrong with our dashboard link

I'm happy to try and debug, is there anything you would like me to do?

Thanks! I appreciate that. If this is happening consistently for you, then a couple of things come to mind. After starting up a cluster, are you able to successfully access the cluster dashboard from https://cloud.coiled.io/clusters? There's an eye icon next to status for each running cluster which you can click to view the dashboard (screenshot below). It'd also be great if you could check the cluster logs (also indicated in the screenshot) to see if distributed.scheduler - INFO - Scheduler closing after being idle for 1200.00 s shows up in the scheduler's logs

Screen Shot 2020-09-29 at 12 39 18 PM
davidbrochart commented 4 years ago

Were you able to run computations on the cluster?

Yes, I run the example computation in the "Getting Started" section and it worked.

are you able to successfully access the cluster dashboard from https://cloud.coiled.io/clusters?

Yes! The dashboard works through e.g. https://cloud.coiled.io/clusters/1123/dashboard, but not through https://cloud.coiled.io/dashboard/1123/status.

Here is the log, but no idle in it:

distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:     tls://10.2.12.39:8786
distributed.scheduler - INFO -   dashboard at:                     :8787
distributed.preloading - INFO - Run preload setup function: https://cloud.coiled.io/preloads/insights.py
distributed.preloading - INFO - Run preload setup function: https://cloud.coiled.io/preloads/aws-credentials.py
distributed.scheduler - INFO - Receive client connection: Client-79459be0-028c-11eb-aa88-81058b3ddfc7
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.236:39707', name: davidbrochart-1123-worker-1-452699, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.236:39707
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.131:43053', name: davidbrochart-1123-worker-9-eb0674, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.131:43053
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.137:33075', name: davidbrochart-1123-worker-3-f3220a, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.137:33075
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.57:46031', name: davidbrochart-1123-worker-4-242bd2, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.57:46031
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.139:34447', name: davidbrochart-1123-worker-10-94b03d, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.139:34447
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.20:38407', name: davidbrochart-1123-worker-2-586027, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.20:38407
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.101:45743', name: davidbrochart-1123-worker-6-1a62c8, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.101:45743
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.90:36705', name: davidbrochart-1123-worker-7-2f0453, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.90:36705
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.228:35039', name: davidbrochart-1123-worker-8-b9be15, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.228:35039
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.111:36073', name: davidbrochart-1123-worker-5-593d19, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.111:36073
distributed.core - INFO - Starting established connection
jrbourbeau commented 4 years ago

The dashboard works through e.g. https://cloud.coiled.io/clusters/1123/dashboard, but not through https://cloud.coiled.io/dashboard/1123/status.

Hmm that's interesting and good to know. We'll look into this more tomorrow to see where things are going wrong (I suspect this is on our end).

Glad to hear you can access the dashboard through the cloud.coiled.io : )

FabioRosado commented 3 years ago

Hello, I've been going through the open issues on this repository and closing some of them. Currently, when you connect a cluster to a dask client you will get an AWS URL that will point you to the desk dashboard - something like:http://ec2-3-12-155-51.us-east-2.compute.amazonaws.com:8787

I'm going to close this issue, but please feel free to re-open or create a new issue if you encounter any problems 😄