Open BadAsstronaut opened 2 years ago
Ouch that does sound painful to debug, sorry you had a bad time!
You're right it looks like the scheduler gets a terminate
call when the ECSCluster._close()
calls super()._close()
I wonder if SpecCluster._close
should have a shutdown_scheduler
kwarg that we can set or something. Would you mind opening an issue on distributed and ping me there?
What happened: Calling
cluster.close
from a FargateCluster instance terminates the remote scheduler process.What you expected to happen: The FargateCluster initiate resources should clean up.
Minimal Complete Verifiable Example:
Anything else we need to know?: This was a bugaboo to debug. Some breadcrumbs:
dask.distributed Scheduler close distributed SpecCluster _close distributed SpecCluster _start
It looks to me like
scheduler_address
in CloudProvider is settingscheduler_comm
on the SpecCluster; thus whencluster.close()
gets called,<SpecClusterInstance>._close()
gets invoked and sends the terminate command via rpc.(EDIT: I tested with
cluster._close
as well and observed the same behavior.)Environment: