Open mbnr85 opened 5 years ago
Hi There, can you give us more details about your use case? Instance type, CUDA version, and more info about what you're trying to do - workload, etc.? Thanks.
We would like to run object detection on Fargate.
Setup: CUDA version 9.0, 9.1 (both work) Instance type p2.xlarge Algorithm: Object detection Input: Frame Output: Metadata preferably json with coordinates and confidence. TPS: 10 frames/sec
Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?
Does Fargate have some concept of reserved instance discounts in EC2 or Sustained usage discounts?
No
I have a similar use case. I'd like to run deep learning inference tasks on CUDA-capable GPUs on Fargate (edit: or Lambda), and pay per second of usage.
The specific use case is inference tasks which are run fairly seldom, but need to respond in seconds, rather than minutes. In other words, waiting a few minutes for an EC2 instance to boot up, just doesn't cut the mustard. But neither does the application need to be taking up a GPU 24/7 unproductively, just to run the inference job for a minute or two, twice a day.
Edit: By mid-2021, extremely easy quantization and optimization, along with with better models, have removed my need for this use case - but I suppose the people giving the comment the thumbs up might still have something going on in this direction.
I also have an inference use-case where we would like to be able to autoscale inference sqs workers in Fargate. We originally tried to use ECS, but found it too cumbersome to scale both the containers and the EC2 instances, so we are currently just using EC2 instances with an autoscaling group. We considered using Sagemaker, but that will require some engineering effort for us to adapt our architecture and models.
I'd be interested in this too and have similar usecases as above.
I have a use case for this too, where we want to spin up GPU resources to do live video streaming of a WebGL application but be able to relinquish those completely after the stream ends, with minimal start up time or over-metering. In our case, we would need the ability to run an X11 server with GPU hardware acceleration.
@mbnr85 I too am trying to do object detection on fargate. Is this even possible (for now)? Have you found anything? What did you do in your case?
When training data science models our workloads can take advantage of GPU compute. To start those workloads will run in ECS although eventually we’d likely migrate those to EKS. We’d like to be able to use Fargate to run GPU accelerated workloads but that is not currently supported. Does AWS have GPU compute on the Fargate roadmap, and if so, is there any timeline that can be shared?
Also interested for machine learning...
Interested for ML training and inference as well. The overhead to transfer to sagemaker is too high, we just train models on EC2 GPU boxes and then use CPU runtime for inference on Fargate instances. However, some models would benefit from GPU at inference time (namely those trained on CUDA specific implementations, which as of now we are not using for lack of inference infrastructure). The inference use case is sporadic, such that a full-time EC2 box is too pricey.
@romanovzky We both are on the same boat I guess. I too am in a similar situation.
I too am looking forward for this feature.
My use-case:
I need to run jobs that benefit from GPU acceleration (mostly model inference and some CPU bound tasks eg. embedding clustering, DB insertions etc.). Each job takes around 10-15 mins on a p2.xlarge. I receive 100-120 such jobs through the day (get 8-10 jobs in the span of 30 sec at max).
My requirement:
A server-less GPU container solution.
My current solution:
My GPU utilizing containers run as custom Sagemaker training jobs.
Advantages:
Disadvantages:
Also.... Some machine learning models require GPU support for predictions (they will not predict on CPU).
For example (an InternalError that can occur when attempting to get a RefineNet predictions on CPU): InternalError: The CPU implementation of FusedBatchNorm only supports NHWC tensor format for now.
I too support GPU support with Fargate
We would like to call from a Docker container (RStudio) several others for a distributed deep/machine learning training using Fargate/AWS Batch. The results should be saved on S3 and wrote back to the RStudio Docker container. Unfortunately, Fargate shows no support for GPUs.
I would also like to launch GPU containers from Fargate. I have two use-cases: 1. spawning powerful deep learning Jupyterhub development environments for our machine-learning group's researchers that will effortlessly disappear when the individual Jupyterhub kernel is killed. 2. Infrequent, quickly-scaled, deep (i.e. the use of GPU is justified) inference tasks.
a thought: for 2., I hadn't thought of using the suggestion above of an auto-scaling EC2 group (that presumably then use something like a scripted docker-machine command to provision the instance, and launch a kernel container) to run the GPU containers, but this seems like a nasty, expensive (in time and currency) hack for what should be a bit more elegant.
Any news on this?
@ClaasBrueggemann I dont think they will provide this anytime soon. AWS is heavily promoting SageMaker now and in many/most cases that's the way to go. :)
what about for 3d model rendering? we aren't needing this for machine learning.
+1 for this support.
what about for 3d model rendering? we aren't needing this for machine learning.
In that case getting a GPU instance like P2, G3 etc might help? Amazon won't be providing GPUs any time soon in fargate I believe.
Any SLA for this? Currently Fargate implementation provides general-purpose CPU cycle speed 2.2GHz- 2.3GHz for us and not capable of running CPU/GPU critical applications.
Fargate does not support GPU and we can expect nearly in future.
In Closing Fargate helped us solve a lot of problems related to real-time processing, including the reduction of operational overhead, for this dynamic environment. We expect it to continue to grow and mature as a service. Some features we would like to see in the near future include GPU support for our GPU-based AI Engines and the ability to cache container images that are larger for quicker “warm” launch times. https://aws.amazon.com/blogs/architecture/building-real-time-ai-with-aws-fargate/
FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.
FWIW, it'd be great to run a typical deep learning experiment queue on something like this. Upload code+configs to S3. Lambda picks up, stuffs it into a container, training runs to completion and saves back to S3. Super simple, very scalable.
Sounds much more like something that sagemaker would do.
What is the status of this? I'm very interested in CUDA support in Fargate tasks.
I want to use GPU-optimised faiss training algorithms on fargate. I'm not training or running a model, I'm just training an HNSW index on faiss.
I have a slightly different use case in that it doesn't involve AI/ML at all. I need to provide my data science team with GPUs in a serverless context for massive calculations that run better on GPUs than CPUs. They run ad hoc containers in an ad hoc manner, so Fargate makes the most sense in enabling them to ship their containers and perform whatever they need instead of needing to max out their local machine. No other AWS service meets this need without requiring extra operational help which is what we are trying to avoid to allow the team to retain ownership over their work.
We would like to be able to use an on-demand GPU with headless Chromium for scheduling jobs to render WebGL image filters implemented as shaders. Currently we are using the SwiftShader in a lambda function for this because we only need to do this a few times a day but need lower latency than an EC2 auto-scaling group. SwiftShader is very slow, however, and is not identical to running on an actual GPU, causing some image quality issues. Having GPU support in Fargate would allow us to spin up ondemand containers to service rendering jobs with overall higher performance than the current lambda solution, while keeping operational costs aligned with actual usage.
Elastic GPU support in lambdas would be amazing too :)
We have a similar use case to @Zirkonium88
We have a p3.8 large instance where we have rstudio teams and we would like to downsize the instance quite a lot to use the kubernetes launcher feature of RStudio. We are using EKS backed with Fargate to launch our jupyterlab sessions and rstudio sessions but some of our users will need GPU acceleration for prototyping.
+1 for GPU on Fargate. ML application. (Is it rude to point out that Azure offers this in their container instance service in preview now?)
+1 GPU support on Fargate: Usecase: I want create a task definition using Fargate for training automation of a DeepLearning model in tensorflow
I use AWS Batch for that :-(
@shlomi-viz
Does AWS Batch work well for that use case?
Can you easily launch containers in Batch?
How fast do the new EC2 instances launch; seconds or minutes?
@craiglytle for my part, it's minutes: anywhere between 2 & 30. Extremely variable, with a huge high-end; a main reason I'm interested in this ticket (though I still need to try raw ECS). I do have a Queue with Spot Instances tried first, On-demand instances fallback; so that could contribute to the launch time.
I tried to implement this: https://aws.amazon.com/blogs/aws/aws-ecs-cluster-auto-scaling-is-now-generally-available/
but it require you to "play" a bit to find your best values for scale in and out, for me that was too slow.
AWS Batch can scale much faster and smarter, BUT it evaluate the scaling every 10 minutes, meaning that every time it will take at least 10 minutes until you start to scale up. From my experience once it started to scale it will be quite fast if you have many tasks in the queue.
You well need to create and mange more resources then running with Fargate, but this is a one time cost.
From me launching a container was basically the same, call submit_job
instead of run_task
.
I have tested it with ~1K jobs/task and it worked fine.
Happy to add more info, if you need @craiglytle
+1 GPU support on Fargate
Usecase: need to run data analysis using rapids library
Current Solution: Use Dask and ECS. I have a ECS GPU service for spinning up ECS GPU workers using ECS capacity providers.
Drawbacks: Each task spins up a GPU EC2 instance, which takes 15-20 minutes.
+1 00000000... Sagemaker doesn't fit into my case because I need to do a lot of work for conversion. EC2 is just to cumbersome to use - I have to manage a lot of stuff, like autoscaling.
+1 👍
We are using CUDA for LIDAR processing. Latest CUDA version would be fine.
The pipeline needs to process thousands of files, loading up the GPU memory and performing analysis and transformation of said data.
We do need a LOT of GPU memory for these data sets, but really any move on the Fargate/GPU support would help at this point.
Currently, the pipeline uses EC2 instances connected to a pre-prepared volume of data (loading from S3 is too slow).
We would also like to use the Fargate container approach to provide more dynamic tools that can scale in direct response to a user's query instead of having to batch process everything in advance.
Same here, this seems to be a long awaited demand. Looking for Fargate or Lambda support for GPU or Inf Instances.
Much awaited feature. Tagging @nathanpeck for some attention :)
Is there any specific technical reason why Fargate does not support GPU Instances?
Is there any specific technical reason why Fargate does not support GPU Instances?
@dheerajmpai My company asked this to AWS Support team. They mentioned that they were working on it. However, we were not provided with any dates. Hope they bring this soon.
+1
+1 on this - running a seperate EC2-based ECS Cluster when every single other thing is on Fargate is needless overhead, we want to retrain some IP Insights and DGL models on push from a Fargate-based API server.
+1 - need GPU support for ECS Fargate
+1
Potential use case for us is "pixel streaming" our educational application for students who don't have the required hardware, where the GPU resources will need to scale up and down unexpectedly as students join and leave the online application.
Please add GPU support for Fargate, we need it for real-time prediction which runs way faster on GPU. Without GPU customer will face high latency issues.
Tell us about your request What do you want us to build?
Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
Are you currently working around this issue? How are you currently solving this problem?
Additional context Anything else we should know?
Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)