aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[service] [request]: Fargate Spot failover to Fargate #852

Open raags opened 4 years ago

raags commented 4 years ago

Community Note

Tell us about your request What do you want us to build?

Right now the capacityProvider strategy allows spreading tasks across Fargate and Fargate Spot launch types (and provide weightage for each type). However, if spot capacity is unavailable, the spot tasks are just not lauched, while fargate count is maintained.

There should be fall-back option, where if spot capacity is unavailable, it should fall back to using fargate launch type.

Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR

ECS, Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

This is the requirement it should satisfy:

Launch 4 ECS tasks with launch type fargate spot, but if spot capacity is not available, fall back to normal fargate.

Are you currently working around this issue? How are you currently solving this problem?

Not able to solve it.

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

coultn commented 4 years ago

Question: are the tasks you are launching business critical functions for which you need high uptime?

raags commented 4 years ago

Question: are the tasks you are launching business critical functions for which you need high uptime?

They are business-critical, but not time-critical, ie. they can be down for a short duration, say 5 mins.

coultn commented 4 years ago

Question: are the tasks you are launching business critical functions for which you need high uptime?

They are business-critical, but not time-critical, ie. they can be down for a short duration, say 5 mins.

Business-critical applications that can't tolerate extended task interruptions are not a good fit for running entirely on Fargate Spot.

raags commented 4 years ago

That is the reason for this feature request. If fargate spot is unavailable (i.e. the task gets interrupted, which is fine) I would like an option for it to fall-back to normal fargate. This is similar to EMR spot fleet, where if a spot request is not fulfilled in a set time, an on-demand instance is provisioned instead.

je-al commented 4 years ago

+1, we run business-critical workloads this way, leveraging spot(inst)'s EGs for ECS/EC2, they fallback automatically. This not being available means we'd need to hack it on our own in order use Fargate.

je-al commented 4 years ago

Just found #773 looking for this issue, seems this would be a duplicate (?)

misterjoshua commented 3 years ago

My use case is low cost web hosting. The tasks are fine being interrupted so long as when spot capacity runs out there's backup capacity available from on demand.

I have been using a workaround for now. I run two services - a primary with only spot capacity and a fallback with only on demand capacity. When the primary emits a task placement error a lambda sets the desired count on the fallback service. When the primary emits a steady state event a lambda sets the fallback desired count to zero.

harishsambasivam commented 2 years ago

If I keep the ratio of Spot:OnDemand as 4:1, will it fallback to fargate when spot is not available at the time of deployment?

chadmyers commented 2 months ago

Use case: I have an ECS service (web app or microservice behind an ALB) that has application auto-scaling on. My desired task count is, say, 4.

I'd like to configure my service to use fargate and fargate spot capacity providers. I'd like to have a base count on fargate of 1 (so I always have at least 1 task running guaranteed at any given time) and then the spot provider is weighted so all additional tasks get added to spot.

If the service is unable to launch tasks on fargate spot, it should launch them using the fargate provider.

I would expect them to keep running on-demand at that point. I don't expect it to be clever enough to move them back to spot if spot capacity happens to become available again. However, every time the service needs to launch a new task, it should try fargate spot again and then fail back if necessary.