[ECS] [Fargate]: can ecs fargate tasks scale to zero ?

dfang commented 4 years ago

Tell us about your request

a feature like this start with 0 tasks in the beginning, when requests come, run a task, more and more requests come, run more task... just like lambda, but running containers, just like GCP cloud run, it's simple and easier to use.

Which service(s) is this request for?

This could be Fargate, ECS

nazreen commented 4 years ago

@dfang I'm curious, why not just use Lambda? Lambda sounds more suitable for this, because it sounds like:

your task is of a smaller scope
you want to pay only for when work is being done (very much what Lambda is tailored for)

dfang commented 4 years ago

@nazreen https://cloud.google.com/run

eganjs commented 3 years ago

Possibly related to #763

CarlosDomingues commented 3 years ago

That would be a killer feature for ECS

mreferre commented 3 years ago

@CarlosDomingues @dfang can you share more about how you envision the scale to zero to be used? I guess what I am saying is ... how long do you expect the task to be "idle" and with which pattern (i.e. very spiky with no pattern Vs idle for 5 days a week and heavy hits 2 days of the week Vs ..... ?). Do you have a real life example of a pattern that could benefit from this today?

Also, most of the times, scale to zero often comes with a cold start penalty (see this for example). How long (realistically) would you be willing to accept as a cold start to have this scale to zero feature?

CarlosDomingues commented 3 years ago

@mreferre sure!

My company develops microservice-oriented data pipelines to ingest and aggregate information from a multitude of sources. We're in the financial industry and our analyses are consumed internally to improve investment decisions. We like doing things using microservices because:

(1) From a development point of view, we get many things for free. Most web frameworks have mature testing libraries, ORMs, data validation libs, etc...

(2) We can abstract data access through REST APIs, making life much easier for those who will consume them. Non-technical users can access data from stuff like Power BI or Microsoft Excel, that's huge for us.

That said, many pipelines don't need to be triggered or accessed frequently. In the end of the day, many of our APIs are actually doing batch jobs triggered by REST endpoints. I've thought about a few other options:

(1) Lambda - The 15 min timeout is a show stopper for us. Also, the developer experience is much, much worse and developer time is an expensive resource. Finally, we also would rather not be locked to something that is AWS exclusive.

(2) AWS Batch - We could have small containers that are used as control planes and issue jobs to AWS Batch as needed. But that's yet another component to manage, we're hitting many tens of different services right know. For many of them, scaling to zero like Knative / Google Cloud Run would be a lot simpler operationally.

Also, most of the times, scale to zero often comes with a cold start penalty (see this for example). How long (realistically) would you be willing to accept as a cold start to have this scale to zero feature?

For data pipelines that are triggered once in an hour, day, week... having a few minutes of cold start time would be totally ok.

For data access that's a bit worse, but fetching data is generally done once in the beginning of an analysis, so that also wouldn't be a show stopper.

I'm open to any thoughts or considerations!

mreferre commented 3 years ago

Thanks @CarlosDomingues this is great feedback.

dave-nm commented 3 years ago

For some insight into our use case, we started using Lambda, but the tooling is difficult for local development/debugging and having to pay the cost of start up time for each request can be problematic when loading settings/connecting to databases is non-trivial, so we moved to using services in Fargate/ECS. It's worked well, but for our development and testing environments, it's unfortunate that there's not a way to have the service scale to 0 when it's not being used (which is most of the time). Ideally, there would be a way to have the requests based scaling have it go to 0 running instances when there are no requests for a period of time.

nicksrandall commented 3 years ago

My use case is a system that processes events but those events only come in during regular business hours. It would be nice to be able to scale to 0 on nights and weekends and then scale back up to the appropriate amount on Monday morning. While at first glance it might seems like Lambda would be a great use-case for this, it does not work for us because our system requires events to be processed with extremely low latency (under 5ms).

coultn commented 3 years ago

@nicksrandall have you considered scheduled scaling with ECS? https://aws.amazon.com/blogs/containers/optimizing-amazon-elastic-container-service-for-cost-using-scheduled-scaling/

wimax-grapl commented 3 years ago

At least in the UI, it seems like a minimum number of tasks = 0 is totally acceptable:

It's worth noting that this behavior is currently broken in CDK with ecs_patterns.QueueProcessingFargateService: https://github.com/aws/aws-cdk/issues/14336

JonMarbach commented 3 years ago

@mreferre I'm interested in scale-to-zero as well. In my case, the pattern would be "bursty" - periods of hours or days (even months) idle, then several minutes of bursts of intense traffic. As far as warm-up time goes, a handful of seconds would be totally fine. We can anticipate (programmatically) when the service will be used and send a ping-like message before we start bursting.

The use case is something like a data visualization application where the user is "roaming" through a big dataset a chunk at a time. The service supplies chunks of data back to a client app as the user "roams", but the sources of the data can vary. We are considering having a service for each different kind of data, especially since some data may be in files, some may be in a database, some may be accessed via some other cloud service, so the particulars and implementation details of each data service could vary a lot. We could do this with Lambdas, but in some cases there is just too much overhead in the initial connection to the data, so by running services we can amortize the initial connection and stay as "hot"/low-latency as possible while the user is hitting the service.

We maybe could manually scale-to-zero using step functions, but setting a minimum-number-of-tasks to 0 and maybe an idle-timeout would be much simpler. Also, this pattern actually fits a number of other cases I'm aware of, so creating a step function to manage each service seems... undifferentiated.

mreferre commented 3 years ago

Hey @JonMarbach we have since released a new service (based on Fargate) that scale to "0.5" (as I like to say). Not sure if you have seen it but AWS App Runner is a fully managed service (built on Fargate) that allows you to scale in and out automatically based on workload patterns. The way it works is that you declare minimum provisioned capacity (min=1) and maximum capacity and the service scaled based on connections (configurable). If there are no connections the service scale down to min provisioned capacity and you only pay for memory. As instances become active due to load you start to pay for mem/CPU (as usual). Today (for now) there is no way to go below 1 provisioned instance and this allows us for super fast response times (no cold starts). HOWEVER, if you know the service may be idle for longer period of times, you want to optimize for costs, you know the load patterns and can sustain a cold start App Runner also supports pause/resume API calls. Would something like this fit your needs?

CarlosDomingues commented 3 years ago

@mreferre played with App Runner a bit and loved it. Not only because of the scaling feature, but also because it is very easy for developers to use. Most AWS services are seriously lacking in this department (incluing Fargate!).

It still needs some features (VPC + Security Groups) before I can use it for real products, but it's a huge step in a good direction. Congrats!

cvsudheer108 commented 3 years ago

Please note....All container instance processing is billed per second, rounded up to the next nearest second. But there is a one minute minimum charge for vCPU resources every time a provisioned container instance starts processing requests. This makes it more expensive than the ECS option, if no of invocations are more, though the actual usage hours might be mich lesser...

dbroadhurst commented 2 years ago

I've tried various frameworks to develop services locally but always move back to creating a docker-compose file to run my dev env locally. It's straightforward to move this local dev env to AWS ECS using CDK but like everyone else I would like my services to scale back to zero when not in use. I'm ok with cold starts if the trade-off is cost.

mreferre commented 2 years ago

@dbroadhurst have you considered App Runner (see above). If you considered and it's not a fit for you, what does it make it not a fit?

dbroadhurst commented 2 years ago

@mreferre App Runner looks interesting and I will spend some time seeing if it's a good fit. The main hurdle is how would I keep my database and App Runner in the same VPC or any other AWS service that requires apps and services to be in the same VPC?

mreferre commented 2 years ago

@dbroadhurst yep. This is issue #1 (I mean, literally: https://github.com/aws/apprunner-roadmap/issues/1). Stay tuned.

Thanks.

tomaszdudek7 commented 2 years ago

Additionally, App Runner does not (fully) scale to zero.

mreferre commented 2 years ago

Additionally, App Runner does not (fully) scale to zero.

Correct. App Runner (as of this writing) scales to 0 for CPU but keeps memory provisioned for at least 1 instance (configurable). Given the cost of memory is a fraction of the cost of CPU we believe the economics may work for a broad set of use cases. The advantage of not scaling to 0 is that there are no cold starts (something you would incur into in most scale-to-real-0 solutions). Would a scale to 0 configuration with cold starts (measured in "many seconds") be something that would be useful to you? What would the value be for "many seconds" after which it would become unusable?

mwcampbell commented 2 years ago

For an application that I work on, it would be useful if Fargate itself exposed the ability to, as @mreferre put it, scale to 0.5. That is, reserve the RAM, and charge for it, but don't charge for CPU time. In this application, I have a component that's not an HTTP service, so App Runner wouldn't work. It requires a separate task for each concurrent session. These sessions may run for an hour or more, so Lambda wouldn't work. I propose adding the concept of an ECS task pool, from which tasks are spawned. I would want the task pool to be configurable so that there's always one "warm" instance, where the image has been pulled and extracted into local storage, and the RAM and maybe the ENI are reserved, but CPU isn't yet being used or billed. Then spawning a task from the pool could take 5 seconds or less. And when I spawn a task, Fargate would spin up a new warm instance to take the place of the one that I just claimed. Would this be doable?

mreferre commented 2 years ago

@mwcampbell Thanks for the feedback. I assume the problem you are having right now (that you are trying to work around) is that starting a task take a long time (say 30-60-120 seconds depending on image size etc). I also assume that if we were able to reduce the time it takes to start the task this will mitigate the need for having this notion of a warm/cheap pool. This is something we are looking into (albeit 5 seconds is quite an aggressive goal and I would not want to set that expectation).

Also out of curiosity (and just as a learning exercise) you seem to be saying that your workload is some sort of batch (or at least asynchronous in nature). Is it so sensitive to task start time?

Thanks!

mwcampbell commented 2 years ago

@mreferre Yes, startup time is the issue. I understand that 5 seconds may be too aggressive; I think 10 seconds or less would be acceptable. My workload is interactive, so it's sensitive to startup time, but it requires an isolated, long-running session. Without giving away too much about the product, think of a desktop application running inside a container, and you'll get the right idea.

mwcampbell commented 2 years ago

To be clear, I'm talking about Linux containers here. I understand that shortening startup time is probably intractable on Windows.

mreferre commented 2 years ago

@mwcampbell cool. Thanks for the background. Yeah I assumed Linux. "Intractable" isn't the word I had in mind for Windows but it gives the idea. :)

Thanks.

mikestopcontinues commented 1 year ago

Fly.io supports instant boot on scaling up from 0. They solve it by (a) launching your (soft) maximum number of containers on deploy then (b) switch them them off, but not destroying them. This allows sleeping for only storage cost.

They also make scaling up relatively quick by storing the container images local to the cluster, so that there's no download time. This means that going from N to N+M only takes container boot time. For most people, that's near zero, especially compared to container download time.

jtatum commented 1 year ago

Scale to zero is also supported by Azure Container Apps. Aside from having to use the rest of Azure, it's a nice alternative to ECS/Fargate.

CarlosDomingues commented 1 year ago

An update, two years later. We never adopted AWS App Runner as:

It's way more expensive than Fargate Spot / EC2 Spot, which we leverage extensively as most of our internal apps can tolerate interruptions.
You pay for idle applications. Between dev and prod, we have 300+ unrelated APIs, Streamlit dashboards, internal tools, ML models... 10 USD monthly for each 2GB RAM app adds up quickly.
It has a 120 seconds timeout, which is a showstopper for many of our internal services.

We ended up using K8s + Karpenter + Knative, which made our AWS bill quite cheap given how many services we are running (and likely contributed to my job security), but I wish AWS supported the "I want to deploy non-critical (from an interruptions and startup delay pov) containers and scale them based on HTTP requests, including scaling to zero when there is no traffic" use case better. I think that's a very, very common use case for SMBs building internal stuff.

wesmontgomery commented 2 months ago

Can we at least be able to place Fargate tasks into a "provisioned" state like App Runner? The tech is already built, can we please have it?

aws / containers-roadmap

[ECS] [Fargate]: can ecs fargate tasks scale to zero ? #1017