The endpoint becomes unavailable when an interruptible work is terminated by the cloud service provider, and it gets available again after interval passes AND preemptible instances become available on the provider side.
Future work: When spot instances are unavailable from the cloud service, fallback to the proxy or to on-demand instances.
The app dies when any interruptible work is terminated by the cloud service provider during the pending state. If it's terminated during the running state, the app keeps running and tries to spin up interruptible works as in 1. above.
Description of this PR
Introduces interval replacement strategy to keep the endpoint available with spot instances.
To enable this feature, set
interruptible=True
:Or, to change the default replacement interval, pass
IntervalReplacement(...)
toAutoScaler
:For benchmark results, see here (internal-only at this time): https://www.notion.so/60aca667b72c4aa79e496f5b61c8182a
Known limitations
interval
passes AND preemptible instances become available on the provider side.