Closed miquelduranfrigola closed 4 months ago
Just to add, if any of the factors are difficult to measure, then it's worth putting together a plan that can test that aspect and post it here or on slack so that we can discuss further.
HI @miquelduranfrigola and @JHlozek this makes a lot of sense, let me re look at other effecient ways we can deploy the model at a much lower cost that will allow for mutiple models without breaking the bank, i am making this a high priority.
Hi @miquelduranfrigola and @JHlozek , please see my recommendations below, I have already started exploring some of these possible options but I need your Go-ahead to start making changes on AWS as it might be billable.
Understanding and Optimizing Costs
ECS Fargate Billing: We’re being billed per second for vCPU and memory usage. It’s important to ensure we’re not allocating more resources than necessary. Resource Allocation: Our costs are influenced by the size of our containers and the CPU/RAM we allocate. By fine-tuning these allocations, we can save quite a bit. Optimizing Our Containers
Lightweight Containers: We should look into reducing the size of our Docker images. Using minimal base images like Alpine Linux and cleaning up unnecessary dependencies can help. Right-Sizing Resources: We need to match our CPU and memory allocations to what our containers actually need, preventing over-provisioning.
Implementing Dynamic Scaling
On-Demand Scaling: We can set up a system to start containers only when needed and stop them when idle. AWS Lambda and AWS Auto Scaling can be really helpful here. Serverless Options: For less frequent model invocations, AWS Lambda might be more cost-effective than running ECS tasks continuously.
Leveraging Cost Management Tools
AWS Cost Explorer: This tool can help us identify the main cost drivers and monitor our usage, making it easier to spot areas for optimization. Budgets and Alerts: Setting up budgets and alerts will keep us aware of our spending and help avoid surprises.
Considering Alternative Compute Options
Spot Instances: For tasks that aren’t critical, Spot Instances can be significantly cheaper than on-demand instances. EC2 Instances: We should evaluate whether using EC2 instances with auto-scaling groups is more cost-effective for our specific workloads.
Using Kubernetes with EKS
Amazon EKS: EKS can help us manage our Docker containers more efficiently. It supports Docker images and can give us better control over scaling and resource allocation, potentially reducing costs.
Reviewing Our Deployment Strategy
Focus on One Model: To start, we should focus on optimizing the deployment of a single model. This will help us fine-tune our setup before scaling out to multiple models. Batch Processing: Grouping model requests into batches can reduce the number of active containers, which will save costs.
Implementation Plan
Conclusion
By taking these steps, we can significantly reduce our AWS ECS costs and create a more sustainable deployment strategy for our models. Using Amazon EKS to orchestrate our Docker containers can provide better resource management and cost efficiency.
Hi @sucksido
Thanks for the detailed explanation, could you provide some numbers as well? it is difficult to assess the different options without an indication of real cost. I know AWS has a good calculator, so can you run some simulations with the different options? Let's assume the following: 100 Models Online 100 queries per month (which is low but could be) per model
And alternative scenario (more realistic) 100 Models Online 1000 queries per month to 20 models and no more than 10 queries to the rest of the models
Also, do you think we can use AWS Lambdas as is now? I understood this service was only for very lightweight models
Thanks @sucksido
This is useful. We can use these points as a basis for discussion. Thanks.
Just dropping a few thoughts here:
Your strategy sounds good. The only thing that has a question mark in my opinion is the reduction of container sizes... I wish we could, but it is not straightforward right now. We will keep you posted. Container size reduction is something @DhanshreeA and myself will do. As for the rest, let's focus on point 1 and we can discuss in the next stand-up meeting.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:ListClusters", "eks:DescribeCluster", "eks:ListNodegroups", "eks:DescribeNodegroup", "eks:ListFargateProfiles", "eks:DescribeFargateProfile", "eks:ListUpdates", "eks:DescribeUpdate", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeLaunchConfigurations", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeScalingProcessTypes", "autoscaling:DescribeScheduledActions", "autoscaling:DescribeTags", "autoscaling:DescribeTerminationPolicyTypes" ], "Resource": "*" } ] }
@GemmaTuron / @DhanshreeA Please grant me the above permissions
@sucksido done!
Thanks @DhanshreeA
Here are the different scenarios and cost based on container sizes, @GemmaTuron @miquelduranfrigola :
• AWS Lambda is cost-effective for low-query scenarios with 1 GB models, but can become expensive for higher query volumes and larger models.
• ECS Fargate offers a balance between cost and flexibility, making it suitable for both low and moderate query scenarios, especially for larger models.
• EKS (Kubernetes) provides more control and scalability, but may incur higher costs due to additional infrastructure management.
Suggestion:
• For Low to Moderate Query Scenarios with 1 GB Models: AWS Lambda is recommended due to its cost-effectiveness and ease of scaling.
• For Higher Memory Usage or Query Volumes: ECS Fargate is recommended, providing a balance of cost, performance, and manageability without the constraints of AWS Lambda.
And we can switch them off when not in use. Feed back welcome
Thanks, @sucksido this is very useful.
While I originally had in mind one deployment solution only, irrespective of the model size, it is true that we could perhaps stratify models into a few categories (e.g. small and large), and deploy them accordingly. I would discard Kubernetes.
One question, @sucksido - In ECS Fargate, how fast or slow is it to "fetch" a model? I am asking because perhaps it would make sense to fetch models based on demand.
Hi @miquelduranfrigola ,
I agree with you about fetching models on demand, perhaps we can explore this option, I would also suggest that we set up the same model on EC2 (Fargate) and one on a Lambda and gauge which option will be more cost effective then decide from there
This is a good idea, @sucksido.
Action plan:
Do a comparison for fetching models by using Lambdas vs ECS, we're going to use this model: eos3804 for testing.
We will leave dynamic fetching of the models for later.
@GemmaTuron / @DhanshreeA Please grant me the below permissions for me to be able to create lambdas:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:GetAccountSettings",
"lambda:CreateFunction",
"lambda:UpdateFunctionConfiguration",
"lambda:InvokeFunction",
"lambda:DeleteFunction",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"iam:PassRole"
],
"Resource": "*"
}
]
}
Hey @sucksido quick question - why do you think "iam:PassRole" is required?
Hi @DhanshreeA we can exclude it, im sure I can still create and invoke Lambdas without it
Hi @sucksido no it's fine, I can add those permissions, I just needed to understand if it's really needed for Lambdas.
@sucksido Done!
HI @DhanshreeA it still seems like im missing the create permission, please assist
I think it might be because of the "iam:PassRole" role
Hi, can we close this issue?
Hi,
Yes, I'll close this and we can open new ones as the need arises
Hi @sucksido ,
As you may know, we recently received the bill for the AWS ECS deployment costs related to the 3 or 4 models that we deployed for the H3D Symposium workshop. At the moment, costs are too high to be sustainable, especially if we want to deploy dozens of models.
Below are a few comments and thoughts. Let's start a discussion around this.