ersilia-os / aws-utils

Utility scripts to interact with AWS
GNU General Public License v3.0
0 stars 0 forks source link

AWS ECS model deployment costs #5

Open miquelduranfrigola opened 1 week ago

miquelduranfrigola commented 1 week ago

Hi @sucksido ,

As you may know, we recently received the bill for the AWS ECS deployment costs related to the 3 or 4 models that we deployed for the H3D Symposium workshop. At the moment, costs are too high to be sustainable, especially if we want to deploy dozens of models.

Below are a few comments and thoughts. Let's start a discussion around this.

JHlozek commented 1 week ago

Just to add, if any of the factors are difficult to measure, then it's worth putting together a plan that can test that aspect and post it here or on slack so that we can discuss further.

sucksido commented 1 week ago

HI @miquelduranfrigola and @JHlozek this makes a lot of sense, let me re look at other effecient ways we can deploy the model at a much lower cost that will allow for mutiple models without breaking the bank, i am making this a high priority.

sucksido commented 1 week ago

Hi @miquelduranfrigola and @JHlozek , please see my recommendations below, I have already started exploring some of these possible options but I need your Go-ahead to start making changes on AWS as it might be billable.

Understanding and Optimizing Costs

ECS Fargate Billing: We’re being billed per second for vCPU and memory usage. It’s important to ensure we’re not allocating more resources than necessary. Resource Allocation: Our costs are influenced by the size of our containers and the CPU/RAM we allocate. By fine-tuning these allocations, we can save quite a bit. Optimizing Our Containers

Lightweight Containers: We should look into reducing the size of our Docker images. Using minimal base images like Alpine Linux and cleaning up unnecessary dependencies can help. Right-Sizing Resources: We need to match our CPU and memory allocations to what our containers actually need, preventing over-provisioning.

Implementing Dynamic Scaling

On-Demand Scaling: We can set up a system to start containers only when needed and stop them when idle. AWS Lambda and AWS Auto Scaling can be really helpful here. Serverless Options: For less frequent model invocations, AWS Lambda might be more cost-effective than running ECS tasks continuously.

Leveraging Cost Management Tools

AWS Cost Explorer: This tool can help us identify the main cost drivers and monitor our usage, making it easier to spot areas for optimization. Budgets and Alerts: Setting up budgets and alerts will keep us aware of our spending and help avoid surprises.

Considering Alternative Compute Options

Spot Instances: For tasks that aren’t critical, Spot Instances can be significantly cheaper than on-demand instances. EC2 Instances: We should evaluate whether using EC2 instances with auto-scaling groups is more cost-effective for our specific workloads.

Using Kubernetes with EKS

Amazon EKS: EKS can help us manage our Docker containers more efficiently. It supports Docker images and can give us better control over scaling and resource allocation, potentially reducing costs.

Reviewing Our Deployment Strategy

Focus on One Model: To start, we should focus on optimizing the deployment of a single model. This will help us fine-tune our setup before scaling out to multiple models. Batch Processing: Grouping model requests into batches can reduce the number of active containers, which will save costs.

Implementation Plan

  1. Immediate Actions: ◦ Stop all current ECS tasks to prevent further costs. (Which we’ve already done) ◦ Analyze our billing details to identify specific cost drivers. (Please share the breakdown)
  2. Optimization: ◦ Streamline our Docker images by removing unnecessary dependencies and using minimal base images. ◦ Right-size our resource allocations to match the actual needs of our tasks. ◦ Implement dynamic scaling solutions using AWS Auto Scaling and AWS Lambda.
  3. Testing and Validation: ◦ Deploy a single optimized model and monitor its performance and cost. ◦ Gradually scale to additional models while continuously monitoring and adjusting our resource allocation and scaling strategies.


By taking these steps, we can significantly reduce our AWS ECS costs and create a more sustainable deployment strategy for our models. Using Amazon EKS to orchestrate our Docker containers can provide better resource management and cost efficiency.

GemmaTuron commented 1 week ago

Hi @sucksido

Thanks for the detailed explanation, could you provide some numbers as well? it is difficult to assess the different options without an indication of real cost. I know AWS has a good calculator, so can you run some simulations with the different options? Let's assume the following: 100 Models Online 100 queries per month (which is low but could be) per model

And alternative scenario (more realistic) 100 Models Online 1000 queries per month to 20 models and no more than 10 queries to the rest of the models

Also, do you think we can use AWS Lambdas as is now? I understood this service was only for very lightweight models

miquelduranfrigola commented 1 week ago

Thanks @sucksido

This is useful. We can use these points as a basis for discussion. Thanks.

Just dropping a few thoughts here:

Your strategy sounds good. The only thing that has a question mark in my opinion is the reduction of container sizes... I wish we could, but it is not straightforward right now. We will keep you posted. Container size reduction is something @DhanshreeA and myself will do. As for the rest, let's focus on point 1 and we can discuss in the next stand-up meeting.

sucksido commented 6 days ago

image { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:ListClusters", "eks:DescribeCluster", "eks:ListNodegroups", "eks:DescribeNodegroup", "eks:ListFargateProfiles", "eks:DescribeFargateProfile", "eks:ListUpdates", "eks:DescribeUpdate", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeLaunchConfigurations", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeScalingProcessTypes", "autoscaling:DescribeScheduledActions", "autoscaling:DescribeTags", "autoscaling:DescribeTerminationPolicyTypes" ], "Resource": "*" } ] }

@GemmaTuron / @DhanshreeA Please grant me the above permissions

DhanshreeA commented 5 days ago

@sucksido done!

sucksido commented 5 days ago

Thanks @DhanshreeA

sucksido commented 4 days ago

Here are the different scenarios and cost based on container sizes, @GemmaTuron @miquelduranfrigola :

image image image

• AWS Lambda is cost-effective for low-query scenarios with 1 GB models, but can become expensive for higher query volumes and larger models.
• ECS Fargate offers a balance between cost and flexibility, making it suitable for both low and moderate query scenarios, especially for larger models.
• EKS (Kubernetes) provides more control and scalability, but may incur higher costs due to additional infrastructure management.


• For Low to Moderate Query Scenarios with 1 GB Models: AWS Lambda is recommended due to its cost-effectiveness and ease of scaling.
• For Higher Memory Usage or Query Volumes: ECS Fargate is recommended, providing a balance of cost, performance, and manageability without the constraints of AWS Lambda.

And we can switch them off when not in use. Feed back welcome

miquelduranfrigola commented 1 day ago

Thanks, @sucksido this is very useful.

While I originally had in mind one deployment solution only, irrespective of the model size, it is true that we could perhaps stratify models into a few categories (e.g. small and large), and deploy them accordingly. I would discard Kubernetes.

One question, @sucksido - In ECS Fargate, how fast or slow is it to "fetch" a model? I am asking because perhaps it would make sense to fetch models based on demand.