Closed davidbrochart closed 3 years ago
Thanks for raising an issue @davidbrochart! I tried the same thing and client.dashboard_link
took me to the Dask dashboard as expected. Did the dashboard link work initially and then time out, or did it never work to begin with?
Also, by default Coiled clusters will automatically shutdown after 20 minutes of inactivity. Is there a chance that 20 minutes elapsed before you clicked on the dashboard link?
Did the dashboard link work initially and then time out, or did it never work to begin with?
It never worked.
Is there a chance that 20 minutes elapsed before you clicked on the dashboard link?
No, I just tried again. I'm happy to try and debug, is there anything you would like me to do?
Hrm, I checked the logs for the cluster and it looks like the cluster spun up normally and then closed down after 20 minutes of being idle:
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tls://10.2.13.107:8786
distributed.scheduler - INFO - dashboard at: :8787
...
distributed.scheduler - INFO - Receive client connection: Client-20ec901e-026d-11eb-8b95-81058b3ddfc7
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.6:33349', name: davidbrochart-1096-worker-6-b670ce, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.6:33349
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.51:34927', name: davidbrochart-1096-worker-1-52bfea, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.51:34927
...
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Scheduler closing after being idle for 1200.00 s
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms
Were you able to run computations on the cluster? Perhaps the cluster is functioning normally, but there's something wrong with our dashboard link
I'm happy to try and debug, is there anything you would like me to do?
Thanks! I appreciate that. If this is happening consistently for you, then a couple of things come to mind. After starting up a cluster, are you able to successfully access the cluster dashboard from https://cloud.coiled.io/clusters? There's an eye icon next to status for each running cluster which you can click to view the dashboard (screenshot below). It'd also be great if you could check the cluster logs (also indicated in the screenshot) to see if distributed.scheduler - INFO - Scheduler closing after being idle for 1200.00 s
shows up in the scheduler's logs
Were you able to run computations on the cluster?
Yes, I run the example computation in the "Getting Started" section and it worked.
are you able to successfully access the cluster dashboard from https://cloud.coiled.io/clusters?
Yes! The dashboard works through e.g. https://cloud.coiled.io/clusters/1123/dashboard, but not through https://cloud.coiled.io/dashboard/1123/status.
Here is the log, but no idle
in it:
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tls://10.2.12.39:8786
distributed.scheduler - INFO - dashboard at: :8787
distributed.preloading - INFO - Run preload setup function: https://cloud.coiled.io/preloads/insights.py
distributed.preloading - INFO - Run preload setup function: https://cloud.coiled.io/preloads/aws-credentials.py
distributed.scheduler - INFO - Receive client connection: Client-79459be0-028c-11eb-aa88-81058b3ddfc7
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.236:39707', name: davidbrochart-1123-worker-1-452699, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.236:39707
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.131:43053', name: davidbrochart-1123-worker-9-eb0674, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.131:43053
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.137:33075', name: davidbrochart-1123-worker-3-f3220a, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.137:33075
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.57:46031', name: davidbrochart-1123-worker-4-242bd2, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.57:46031
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.139:34447', name: davidbrochart-1123-worker-10-94b03d, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.139:34447
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.20:38407', name: davidbrochart-1123-worker-2-586027, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.20:38407
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.101:45743', name: davidbrochart-1123-worker-6-1a62c8, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.101:45743
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.13.90:36705', name: davidbrochart-1123-worker-7-2f0453, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.13.90:36705
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.11.228:35039', name: davidbrochart-1123-worker-8-b9be15, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.11.228:35039
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <Worker 'tls://10.2.12.111:36073', name: davidbrochart-1123-worker-5-593d19, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tls://10.2.12.111:36073
distributed.core - INFO - Starting established connection
The dashboard works through e.g. https://cloud.coiled.io/clusters/1123/dashboard, but not through https://cloud.coiled.io/dashboard/1123/status.
Hmm that's interesting and good to know. We'll look into this more tomorrow to see where things are going wrong (I suspect this is on our end).
Glad to hear you can access the dashboard through the cloud.coiled.io : )
Hello, I've been going through the open issues on this repository and closing some of them. Currently, when you connect a cluster to a dask client you will get an AWS URL that will point you to the desk dashboard - something like:http://ec2-3-12-155-51.us-east-2.compute.amazonaws.com:8787
I'm going to close this issue, but please feel free to re-open or create a new issue if you encounter any problems 😄
After installing and logging in, the link to the dashboard times out: