Open martinedgefocus opened 2 years ago
It would be great to support this! We already support spot in the ECSCluster
manager and it would be awesome to have it here too.
I would expect that when the VM is terminated it will be gracefully shut down, so the process gets a SIGTERM
and it will attempt to notify the scheduler and push any memory off onto other workers.
Depending on how much memory is on the worker the time allowed to shutdown may not be long enough, so we might want the workers to watch for the termination notification and preemptively shutdown. #48 covers this for ECSCluster
and it would likely be reusable here too.
We already have something similar for the Azure preemptible instances. https://github.com/dask/dask-cloudprovider/blob/75fc8eca21b599d2cf7ec07a5441954c8deb660a/dask_cloudprovider/azure/utils.py#L17-L22
@martinedgefocus your patch looks like a great first start at this, it should definitely be configurable and opt in via the Dask config. Do you have any interest in raising a PR?
This works from a touch test, will keep working with it here, but interested in any feedback / collaboration. Seems critical to support, as this should be a 4-5x reduction in cost. Once I finally figured out what to change, was fairly simple, concerns are:
For reference: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_SpotMarketOptions.html