aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

AWS Fargate GPU Support: When is GPU support coming to fargate? #88

Open mbnr85 opened 5 years ago

mbnr85 commented 5 years ago

Tell us about your request What do you want us to build?

Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue? How are you currently solving this problem?

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

prameshbajra commented 3 years ago

Please add GPU support for Fargate, we need it for real-time prediction which runs way faster on GPU. Without GPU customers will face high latency issues.

We had a similar use case too. We are using SageMaker now and it is literally amazing. The Batch Transform feature is literally amazing. We are running 1000+ predictions under 500 seconds. I would suggest going with SageMaker as waiting for this feature to come might take a long long time.

genifycom commented 3 years ago

Thanks for the suggestion.

TimDommett commented 3 years ago

Please add GPU support for Fargate, we need it for real-time prediction which runs way faster on GPU. Without GPU customers will face high latency issues.

We had a similar use case too. We are using SageMaker now and it is literally amazing. The Batch Transform feature is literally amazing. We are running 1000+ predictions under 500 seconds. I would suggest going with SageMaker as waiting for this feature to come might take a long long time.

Thanks for the suggestion @prameshbajra we will definitely give this a try.

pkiv commented 3 years ago

+1, would appreciate GPU support on fargate. Even if it was fractional GPU support. (my fargate tasks only need 10-20% of a T4).

PhilCliv commented 3 years ago

Need GPU support as well

lefnire commented 3 years ago

Y'all, it's not gonna come - don't hold your breath. There's lots of GPU-based options outside Fargate: SageMaker Batch Transform or Transform Job. AWS Batch - maybe fractional GPU support via containers sharing an instance / cluster? Mix/matching instance with GPU via Elastic GPUs / Elastic Inference. Efficient GPUs eg Trainium or Inferentia. SageMaker can share GPU usage via multi-model endpoints.

So - EC2, (ECS?), Batch, SageMaker (Batch Transform, Transform Jobs), and mix/matching GPU on that list via Elastic [Inference|GPU], Trainium, Inferentia, etc. All that's to say, Fargate GPU support might not come - because it's pretty much not necessary (covered by other services).

ghost commented 3 years ago

@lefnire none of other services scale to zero after job is run. Thats the main point of Fargate. Can do this in GKE, setup node group with GPU nodes that will scale to zero once job is finished. Maybe we'll just run this workload over there...

lefnire commented 3 years ago

@whillas that's an important feature to me too, so correct me if I'm wrong in my explanation of how they do scale to 0. Batch & SageMaker Batch Transform are fire-and-forget. They're not auto-scaling based on load, instead you manually submit a job - new container - run till completion - die. As opposed to SageMaker Endpoints which is a managed auto-scaling environment, which indeed can't scale to 0 (which I hate - honestly I think we should be requesting scale-to-0 SageMaker endpoints rather than Fargate GPUs).

The bummer here is cold-start loading of resources. Eg, loading a huggingface/transformers model into memory before running inference can take quite some time. I'm not sure if there's an equivalence in Batch / SM:BT to the way freshly-completed Lambda functions can keep variables loaded outside handler()? Because I do know that Batch, at least, keeps freshly-completed containers around for subsequent calls - just not sure about memory-loaded resources.

So my thought is: rather than thinking of an auto-scaling Fargate cluster, just submit manual one-offs via Batch / SM:BT. I actually didn't know that Fargate scales to 0... that does alter my opinion here. I am crossing my fingers they listen, just not holding breath they do.

ghost commented 3 years ago

@lefnire yeah you are right about the Batch Transform, it does scale up and down for standalone jobs and it suggested that you can trigger these jobs via lambda functions, which is another requirement that most people have i.e. orchestration in production which is why i was looking at this in the first place.

My gripes with it are :

Thanks for being a sounding board! (Its a pitty AWS's forums are useless)

lefnire commented 2 years ago

Just saw Serverless NLP Inference on Amazon SageMaker with Transformer Models from Hugging Face. Seems SageMaker now has Serverless Inference. I'm with you on the S3 integration requirement, I too use a RDMS. If this works, I could eat the requirement for training; but use DB data for inference (the important part) by loading it up in app code (server, lambda, etc) to send against inference per article.

For me at least, this solves the issue for which I'd subscribed to this thread. I'll report back after fiddling with it, if anyone cares

heukirne commented 2 years ago

@lefnire , ECS still had the advantage of using Spot Instance price for running inference for batch purposes. Sagemaker Batch Transform run over on demand and reserved prices. A Fargate GPU Support probably should be cheaper.

But I agree that for realtime inference, Sagemaker Serverless Inference is a good use case.

uumami commented 2 years ago

We want to run ML training of different models, sometimes a simple instance will suffice others we need GPUs and memory.

appuCES commented 2 years ago

+1, I'm using Fargate to process videos, I think having an accelerated hardware would help me cutting down some processing time.

tltran-legion commented 2 years ago

We are building a Cloud Engine, most of our workload are composed of video rendering using Nvidia drivers. Building an ECS cluster with EC2 instances can be done, but not simple. Having an ECS running on Fargate which support GPU will be much more efficient for operational purpose.

ngander-amfam commented 2 years ago

Maybe with this announcement, https://aws.amazon.com/about-aws/whats-new/2022/03/bottlerocket-support-gpu-ec2-instance-types-powered-by-nvidia/, this can finally be realized.

omieomye commented 2 years ago

Quick update: we've begun thinking about how to ship this, and are actively designing. I'll leave this in the current state until we have better line-of-sight internally, and then provide another update.

bigclap commented 2 years ago

@omieomye Hi) Any news?

omieomye commented 2 years ago

@bigclap I don’t have a definitive when we’ll release this, but we’re actively implementing. A decision we have is what GPU type to support at launch. If the community has an opinion on needs GPU type(s), love to hear it.

orgoro commented 2 years ago

@bigclap I don’t have a definitive when we’ll release this, but we’re actively implementing. A decision we have is what GPU type to support at launch. If the community has an opinion on needs GPU type(s), love to hear it.

Thank you for the update omieomye this feature will be a game changer for us We use these instances for various use cases:

tadam98 commented 2 years ago

Strongly support !

ssathasivanACR commented 2 years ago

GPU on Fargate could be a game changer... Strongly support

tltran-legion commented 2 years ago

Thank you for the update @omieomye, we are actually using the g4dn actively on ca-central-1 and us-east-1.

tbobm commented 2 years ago

Thanks for the update @omieomye ! :bowing_man:

I'm mostly using g4dn instances :)

nexarChile commented 2 years ago

THanks @omieomye for the good news!

At the moment we use g4dn.xlarge and t2.small instances on us-east-1, for processing video with GPU containers.

It will be very useful if we can use GPU containers with Fargate! Please let us know when we can start using it. :D

javabudd commented 2 years ago

g5xlarge here!

samevision commented 2 years ago

I am really interested in this as well!

Could you give us any information when it will be available, @omieomye?

renanmb commented 2 years ago

On some of NVIDIA documentation there is stuff refering to this so. Its coming really soon

aashitvyas commented 2 years ago

We are currently utilizing all the G instances on our stack and interested in this feature as well.

TejaswiniiB commented 2 years ago

Hi @omieomye , Any update reg. when GPU support on Fargate will be available?

RaiaN commented 2 years ago

Support of Windows instances with GPUs is very much needed for Fargate.

longerHost commented 2 years ago

The GPU support will make fargate a great game-changer product. Thanks, @omieomye for the update, can't wait to see its release.

deRooij commented 2 years ago

This would help us out a lot, any status updates you are willing to share? Our team uses mostly g4's, p3's and g5's.

haugstve commented 2 years ago

I was considering Fargate for hosting my model. Now it looks like I will install docker on EC2 and run it in a container that way. I hope this will be available soon.

dvasilen commented 2 years ago

@omieomye

If the community has an opinion on needs GPU type(s), love to hear it.

We'd need A100 or at least V100.

ltrojan commented 2 years ago

+1

Any update on this issue? @omieomye ?

omieomye commented 2 years ago

Thanks for the feedback on GPU types! As is customary, I can't update on a specific month or date except to say that this is a top 5 capability enhancement for us and we're actively working on it.

granthamtaylor commented 2 years ago

Just wanted to add my +1 here.

GPU support for Fargate would be an absolute game changer for serverless ML training.

smovva commented 2 years ago

+1

tuanqle commented 2 years ago

+1

bieblebrox commented 2 years ago

+1 👍

ktg0210 commented 2 years ago

+1

hoduulmu commented 2 years ago

+1

mrifni commented 2 years ago

@omieomye in the roadmap it says "49 We're Working On It", do you know when would this be available ?

anna-geller commented 2 years ago

+1 Adding GPU support to Fargate would make serverless finally applicable to data science workloads. This is really needed and would be so extremely appreciated!

JunjieTang-D1 commented 2 years ago

+1 Important feature for Autonomous Driving Data Framework (ADDF) https://github.com/awslabs/autonomous-driving-data-framework

dgraeber commented 2 years ago

+1 Important for MLOps

heukirne commented 2 years ago

Spoiler: I heard this feature will be presented in AWS re:Invent 2022! :partying_face:

ebr commented 2 years ago

@heukirne where have you heard this? If on the AWS ML/AI innovation day twitch stream, that was only someone speculating in chat. There does not seem to be any concrete roadmap for this as far as I can tell. Unfortunately.

Gunnar-Stunnar commented 2 years ago

Is this what everyone is looking for? https://aws.amazon.com/blogs/containers/running-gpu-based-container-applications-with-amazon-ecs-anywhere/

appuCES commented 2 years ago

Is this what everyone is looking for? https://aws.amazon.com/blogs/containers/running-gpu-based-container-applications-with-amazon-ecs-anywhere/

I don’t think I’m looking for this.