AWS Fargate GPU Support: When is GPU support coming to fargate?

mbnr85 commented 5 years ago

Tell us about your request What do you want us to build?

Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue? How are you currently solving this problem?

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

pauncejones commented 5 years ago

Hi There, can you give us more details about your use case? Instance type, CUDA version, and more info about what you're trying to do - workload, etc.? Thanks.

mbnr85 commented 5 years ago

We would like to run object detection on Fargate.

Setup: CUDA version 9.0, 9.1 (both work) Instance type p2.xlarge Algorithm: Object detection Input: Frame Output: Metadata preferably json with coordinates and confidence. TPS: 10 frames/sec

Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?

FernandoMiguel commented 5 years ago

Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?

No

mikaelhg commented 5 years ago

I have a similar use case. I'd like to run deep learning inference tasks on CUDA-capable GPUs on Fargate (edit: or Lambda), and pay per second of usage.

The specific use case is inference tasks which are run fairly seldom, but need to respond in seconds, rather than minutes. In other words, waiting a few minutes for an EC2 instance to boot up, just doesn't cut the mustard. But neither does the application need to be taking up a GPU 24/7 unproductively, just to run the inference job for a minute or two, twice a day.

Edit: By mid-2021, extremely easy quantization and optimization, along with with better models, have removed my need for this use case - but I suppose the people giving the comment the thumbs up might still have something going on in this direction.

juve commented 5 years ago

I also have an inference use-case where we would like to be able to autoscale inference sqs workers in Fargate. We originally tried to use ECS, but found it too cumbersome to scale both the containers and the EC2 instances, so we are currently just using EC2 instances with an autoscaling group. We considered using Sagemaker, but that will require some engineering effort for us to adapt our architecture and models.

aysark commented 5 years ago

I'd be interested in this too and have similar usecases as above.

gfodor commented 5 years ago

I have a use case for this too, where we want to spin up GPU resources to do live video streaming of a WebGL application but be able to relinquish those completely after the stream ends, with minimal start up time or over-metering. In our case, we would need the ability to run an X11 server with GPU hardware acceleration.

prameshbajra commented 5 years ago

@mbnr85 I too am trying to do object detection on fargate. Is this even possible (for now)? Have you found anything? What did you do in your case?

ngander-amfam commented 5 years ago

When training data science models our workloads can take advantage of GPU compute. To start those workloads will run in ECS although eventually we’d likely migrate those to EKS. We’d like to be able to use Fargate to run GPU accelerated workloads but that is not currently supported. Does AWS have GPU compute on the Fargate roadmap, and if so, is there any timeline that can be shared?

tomfranken commented 5 years ago

Also interested for machine learning...

romanovzky commented 5 years ago

Interested for ML training and inference as well. The overhead to transfer to sagemaker is too high, we just train models on EC2 GPU boxes and then use CPU runtime for inference on Fargate instances. However, some models would benefit from GPU at inference time (namely those trained on CUDA specific implementations, which as of now we are not using for lack of inference infrastructure). The inference use case is sporadic, such that a full-time EC2 box is too pricey.

prameshbajra commented 5 years ago

@romanovzky We both are on the same boat I guess. I too am in a similar situation.

ashirgao commented 5 years ago

I too am looking forward for this feature.

My use-case:

I need to run jobs that benefit from GPU acceleration (mostly model inference and some CPU bound tasks eg. embedding clustering, DB insertions etc.). Each job takes around 10-15 mins on a p2.xlarge. I receive 100-120 such jobs through the day (get 8-10 jobs in the span of 30 sec at max).

My requirement:

A server-less GPU container solution.

My current solution:

My GPU utilizing containers run as custom Sagemaker training jobs.

Advantages:

With my increased Sagemaker limit on p2.xlarge systems, I can have 20 jobs running in parallel. And 0 idle cost. So, sort of server-less GPU containers :)
Per-second billing.
My containers have minimal Sagemaker specific code and hence can be easily run on EC2, ECS or even my own desktop system.

Disadvantages:

Sagemaker actually spawns a new instance for my container. This results in longer wait times. (Usually 2x Fargate wait times.)
Need to add additional logic in my lambda function that triggers Fargate jobs and Sagemaker jobs separately.

ctmckee commented 5 years ago

Also.... Some machine learning models require GPU support for predictions (they will not predict on CPU).

For example (an InternalError that can occur when attempting to get a RefineNet predictions on CPU): InternalError: The CPU implementation of FusedBatchNorm only supports NHWC tensor format for now.

I too support GPU support with Fargate

Zirkonium88 commented 5 years ago

We would like to call from a Docker container (RStudio) several others for a distributed deep/machine learning training using Fargate/AWS Batch. The results should be saved on S3 and wrote back to the RStudio Docker container. Unfortunately, Fargate shows no support for GPUs.

richarms commented 5 years ago

I would also like to launch GPU containers from Fargate. I have two use-cases: 1. spawning powerful deep learning Jupyterhub development environments for our machine-learning group's researchers that will effortlessly disappear when the individual Jupyterhub kernel is killed. 2. Infrequent, quickly-scaled, deep (i.e. the use of GPU is justified) inference tasks.

a thought: for 2., I hadn't thought of using the suggestion above of an auto-scaling EC2 group (that presumably then use something like a scripted docker-machine command to provision the instance, and launch a kernel container) to run the GPU containers, but this seems like a nasty, expensive (in time and currency) hack for what should be a bit more elegant.

ClaasBrueggemann commented 4 years ago

Any news on this?

prameshbajra commented 4 years ago

@ClaasBrueggemann I dont think they will provide this anytime soon. AWS is heavily promoting SageMaker now and in many/most cases that's the way to go. :)

jl-DaDar commented 4 years ago

what about for 3d model rendering? we aren't needing this for machine learning.

goswamig commented 4 years ago

+1 for this support.

prameshbajra commented 4 years ago

what about for 3d model rendering? we aren't needing this for machine learning.

In that case getting a GPU instance like P2, G3 etc might help? Amazon won't be providing GPUs any time soon in fargate I believe.

srinivaspype commented 4 years ago

Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed 2.2GHz- 2.3GHz for us and not capable of running CPU/GPU critical applications.

adiii717 commented 4 years ago

Fargate does not support GPU and we can expect nearly in future.

In Closing Fargate helped us solve a lot of problems related to real-time processing, including the reduction of operational overhead, for this dynamic environment. We expect it to continue to grow and mature as a service. Some features we would like to see in the near future include GPU support for our GPU-based AI Engines and the ability to cache container images that are larger for quicker “warm” launch times. https://aws.amazon.com/blogs/architecture/building-real-time-ai-with-aws-fargate/

depthwise commented 4 years ago

FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.

prameshbajra commented 4 years ago

FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.

Sounds much more like something that sagemaker would do.

mrichman commented 4 years ago

What is the status of this? I'm very interested in CUDA support in Fargate tasks.

ndtreviv commented 4 years ago

I want to use GPU-optimised faiss training algorithms on fargate. I'm not training or running a model, I'm just training an HNSW index on faiss.

siobhansabino commented 4 years ago

I have a slightly different use case in that it doesn't involve AI/ML at all. I need to provide my data science team with GPUs in a serverless context for massive calculations that run better on GPUs than CPUs. They run ad hoc containers in an ad hoc manner, so Fargate makes the most sense in enabling them to ship their containers and perform whatever they need instead of needing to max out their local machine. No other AWS service meets this need without requiring extra operational help which is what we are trying to avoid to allow the team to retain ownership over their work.

pagameba commented 4 years ago

We would like to be able to use an on-demand GPU with headless Chromium for scheduling jobs to render WebGL image filters implemented as shaders. Currently we are using the SwiftShader in a lambda function for this because we only need to do this a few times a day but need lower latency than an EC2 auto-scaling group. SwiftShader is very slow, however, and is not identical to running on an actual GPU, causing some image quality issues. Having GPU support in Fargate would allow us to spin up ondemand containers to service rendering jobs with overall higher performance than the current lambda solution, while keeping operational costs aligned with actual usage.

Elastic GPU support in lambdas would be amazing too :)

edgBR commented 4 years ago

We have a similar use case to @Zirkonium88

We have a p3.8 large instance where we have rstudio teams and we would like to downsize the instance quite a lot to use the kubernetes launcher feature of RStudio. We are using EKS backed with Fargate to launch our jupyterlab sessions and rstudio sessions but some of our users will need GPU acceleration for prototyping.

craiglytle commented 3 years ago

+1 for GPU on Fargate. ML application. (Is it rude to point out that Azure offers this in their container instance service in preview now?)

fnandocontreras commented 3 years ago

+1 GPU support on Fargate: Usecase: I want create a task definition using Fargate for training automation of a DeepLearning model in tensorflow

shlomi-viz commented 3 years ago

I use AWS Batch for that :-(

craiglytle commented 3 years ago

@shlomi-viz Does AWS Batch work well for that use case?
Can you easily launch containers in Batch?
How fast do the new EC2 instances launch; seconds or minutes?

lefnire commented 3 years ago

@craiglytle for my part, it's minutes: anywhere between 2 & 30. Extremely variable, with a huge high-end; a main reason I'm interested in this ticket (though I still need to try raw ECS). I do have a Queue with Spot Instances tried first, On-demand instances fallback; so that could contribute to the launch time.

shlomi-viz commented 3 years ago

I tried to implement this: https://aws.amazon.com/blogs/aws/aws-ecs-cluster-auto-scaling-is-now-generally-available/ but it require you to "play" a bit to find your best values for scale in and out, for me that was too slow. AWS Batch can scale much faster and smarter, BUT it evaluate the scaling every 10 minutes, meaning that every time it will take at least 10 minutes until you start to scale up. From my experience once it started to scale it will be quite fast if you have many tasks in the queue. You well need to create and mange more resources then running with Fargate, but this is a one time cost. From me launching a container was basically the same, call submit_job instead of run_task. I have tested it with ~1K jobs/task and it worked fine.

Happy to add more info, if you need @craiglytle

jpujari-ctx commented 3 years ago

+1 GPU support on Fargate

Usecase: need to run data analysis using rapids library

Current Solution: Use Dask and ECS. I have a ECS GPU service for spinning up ECS GPU workers using ECS capacity providers.

Drawbacks: Each task spins up a GPU EC2 instance, which takes 15-20 minutes.

dslove commented 3 years ago

+1 00000000... Sagemaker doesn't fit into my case because I need to do a lot of work for conversion. EC2 is just to cumbersome to use - I have to manage a lot of stuff, like autoscaling.

andreasunterhuber commented 3 years ago

+1 👍

genifycom commented 3 years ago

We are using CUDA for LIDAR processing. Latest CUDA version would be fine.

The pipeline needs to process thousands of files, loading up the GPU memory and performing analysis and transformation of said data.

We do need a LOT of GPU memory for these data sets, but really any move on the Fargate/GPU support would help at this point.

Currently, the pipeline uses EC2 instances connected to a pre-prepared volume of data (loading from S3 is too slow).

We would also like to use the Fargate container approach to provide more dynamic tools that can scale in direct response to a user's query instead of having to batch process everything in advance.

dheerajmpai commented 3 years ago

Same here, this seems to be a long awaited demand. Looking for Fargate or Lambda support for GPU or Inf Instances.

masoom commented 3 years ago

Much awaited feature. Tagging @nathanpeck for some attention :)

dheerajmpai commented 3 years ago

Is there any specific technical reason why Fargate does not support GPU Instances?

prameshbajra commented 3 years ago

Is there any specific technical reason why Fargate does not support GPU Instances?

@dheerajmpai My company asked this to AWS Support team. They mentioned that they were working on it. However, we were not provided with any dates. Hope they bring this soon.

shkr commented 3 years ago

+1

jonrau1 commented 3 years ago

+1 on this - running a seperate EC2-based ECS Cluster when every single other thing is on Fargate is needless overhead, we want to retrain some IP Insights and DGL models on push from a Fargate-based API server.

NIMO-Industries commented 3 years ago

+1 - need GPU support for ECS Fargate

icalvete commented 3 years ago

+1

TimDommett commented 3 years ago

Potential use case for us is "pixel streaming" our educational application for students who don't have the required hardware, where the GPU resources will need to scale up and down unexpectedly as students join and leave the online application.

sinwar commented 2 years ago

Please add GPU support for Fargate, we need it for real-time prediction which runs way faster on GPU. Without GPU customer will face high latency issues.

aws / containers-roadmap

AWS Fargate GPU Support: When is GPU support coming to fargate? #88