Open Vesyrak opened 10 months ago
Hi @Vesyrak
As to why when running with debugpy things are sad I don't know. I don't have experience with that project.
For getting logs when things don't work, this is typically done by the system hosting Dask, in this case Fargate. Dask just puts logs in stdout/stderr. You'll want to figure out what Fargate does with those. At Coiled (managed Dask service) we tend to route logs to cloudwatch and then use cloudwatch APIs to serve up those logs. Maybe you could do something similar?
@Vesyrak: Is there anything actionable for us to do here?
Describe the issue: When launching the scheduler on our AWS Fargate instance, everything works as intended. However, when launching the scheduler with
debugpy
, to enable remote debugging, 90% of the time the dashboard does not start. This causes our cluster to fail, as we depend on its healthcheck to monitor the cluster health. Once every while, it does boot correctly, but this success appears to be rare and at random. We correctly configured the Fargate instance for remote debugging, and in the scenarios where it does boot, we can successfully debug the dask scheduler. The scheduler logs show no errors, and claims that the dashboard boots.Is there any way to check its logs, or figure out the cause for this? We cannot reproduce this issue locally.
Environment: