AWS ECS model deployment costs

miquelduranfrigola commented 5 months ago

Hi @sucksido ,

As you may know, we recently received the bill for the AWS ECS deployment costs related to the 3 or 4 models that we deployed for the H3D Symposium workshop. At the moment, costs are too high to be sustainable, especially if we want to deploy dozens of models.

[x] For now (if we haven't yet), we need to kill all model deployments until we identify a sustainable solution.

Below are a few comments and thoughts. Let's start a discussion around this.

How is ECS exactly billed? I don't think we are being billed by traffic (only). Probably, we are paying quite a lot for allocated resources, which we need to optimize.
If we are paying for allocating resources, where does the cost come from? Size of the containers? Allocated capacity? CPU/RAM demands? If I understand it correctly, we are using Fargate, which is in principle billed per second. This is very important to investigate. If size of containers is the issue, then we need to put substantial effort on lightening those containers, which won't be easy. If allocated capacity is the issue, then we need to correctly match allocated capacity and computational demands of the containers, which may be much simpler to do.
In any case, we may want to think of a dynamic system that makes models available based on demand. We need to think about this carefully, since latency may end up being prohibitive, but this is certainly something we want to explore.
From now on, when exploring ECS, we need to focus on just one model and, only if necessary, do more than one model at a time. Any of the models used in the H3D Symposium workshop will do.

JHlozek commented 5 months ago

Just to add, if any of the factors are difficult to measure, then it's worth putting together a plan that can test that aspect and post it here or on slack so that we can discuss further.

sucksido commented 5 months ago

HI @miquelduranfrigola and @JHlozek this makes a lot of sense, let me re look at other effecient ways we can deploy the model at a much lower cost that will allow for mutiple models without breaking the bank, i am making this a high priority.

sucksido commented 5 months ago

Hi @miquelduranfrigola and @JHlozek , please see my recommendations below, I have already started exploring some of these possible options but I need your Go-ahead to start making changes on AWS as it might be billable.

Understanding and Optimizing Costs

ECS Fargate Billing: We’re being billed per second for vCPU and memory usage. It’s important to ensure we’re not allocating more resources than necessary. Resource Allocation: Our costs are influenced by the size of our containers and the CPU/RAM we allocate. By fine-tuning these allocations, we can save quite a bit. Optimizing Our Containers

Lightweight Containers: We should look into reducing the size of our Docker images. Using minimal base images like Alpine Linux and cleaning up unnecessary dependencies can help. Right-Sizing Resources: We need to match our CPU and memory allocations to what our containers actually need, preventing over-provisioning.

Implementing Dynamic Scaling

On-Demand Scaling: We can set up a system to start containers only when needed and stop them when idle. AWS Lambda and AWS Auto Scaling can be really helpful here. Serverless Options: For less frequent model invocations, AWS Lambda might be more cost-effective than running ECS tasks continuously.

Leveraging Cost Management Tools

AWS Cost Explorer: This tool can help us identify the main cost drivers and monitor our usage, making it easier to spot areas for optimization. Budgets and Alerts: Setting up budgets and alerts will keep us aware of our spending and help avoid surprises.

Considering Alternative Compute Options

Spot Instances: For tasks that aren’t critical, Spot Instances can be significantly cheaper than on-demand instances. EC2 Instances: We should evaluate whether using EC2 instances with auto-scaling groups is more cost-effective for our specific workloads.

Using Kubernetes with EKS

Amazon EKS: EKS can help us manage our Docker containers more efficiently. It supports Docker images and can give us better control over scaling and resource allocation, potentially reducing costs.

Reviewing Our Deployment Strategy

Focus on One Model: To start, we should focus on optimizing the deployment of a single model. This will help us fine-tune our setup before scaling out to multiple models. Batch Processing: Grouping model requests into batches can reduce the number of active containers, which will save costs.

Implementation Plan

Immediate Actions: ◦ Stop all current ECS tasks to prevent further costs. (Which we’ve already done) ◦ Analyze our billing details to identify specific cost drivers. (Please share the breakdown)
Optimization: ◦ Streamline our Docker images by removing unnecessary dependencies and using minimal base images. ◦ Right-size our resource allocations to match the actual needs of our tasks. ◦ Implement dynamic scaling solutions using AWS Auto Scaling and AWS Lambda.
Testing and Validation: ◦ Deploy a single optimized model and monitor its performance and cost. ◦ Gradually scale to additional models while continuously monitoring and adjusting our resource allocation and scaling strategies.

Conclusion

By taking these steps, we can significantly reduce our AWS ECS costs and create a more sustainable deployment strategy for our models. Using Amazon EKS to orchestrate our Docker containers can provide better resource management and cost efficiency.

GemmaTuron commented 5 months ago

Hi @sucksido

Thanks for the detailed explanation, could you provide some numbers as well? it is difficult to assess the different options without an indication of real cost. I know AWS has a good calculator, so can you run some simulations with the different options? Let's assume the following: 100 Models Online 100 queries per month (which is low but could be) per model

And alternative scenario (more realistic) 100 Models Online 1000 queries per month to 20 models and no more than 10 queries to the rest of the models

Also, do you think we can use AWS Lambdas as is now? I understood this service was only for very lightweight models

miquelduranfrigola commented 5 months ago

Thanks @sucksido

This is useful. We can use these points as a basis for discussion. Thanks.

Just dropping a few thoughts here:

We are working on reducing the size of the containers, but they will always be somewhat big. CC'ing @DhanshreeA here. Actually, in a branch of Ersilia I am working on this quite intensively, but I don't expect containers to go much below 500 MB in size.
Matching allocated resources and container demands makes sense.
I really like on-demand scaling, so let's at least consider this option. I do think we will have not-so-frequent invocations.
Certainly, let's use cost-management tools. I have zero experience with them.
Kubernetes: in principle we should at least explore it. @DhanshreeA - opinions?
Definitely, let's start with one model as a case example.

Your strategy sounds good. The only thing that has a question mark in my opinion is the reduction of container sizes... I wish we could, but it is not straightforward right now. We will keep you posted. Container size reduction is something @DhanshreeA and myself will do. As for the rest, let's focus on point 1 and we can discuss in the next stand-up meeting.

sucksido commented 5 months ago

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:ListClusters", "eks:DescribeCluster", "eks:ListNodegroups", "eks:DescribeNodegroup", "eks:ListFargateProfiles", "eks:DescribeFargateProfile", "eks:ListUpdates", "eks:DescribeUpdate", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeLaunchConfigurations", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeScalingProcessTypes", "autoscaling:DescribeScheduledActions", "autoscaling:DescribeTags", "autoscaling:DescribeTerminationPolicyTypes" ], "Resource": "*" } ] }

@GemmaTuron / @DhanshreeA Please grant me the above permissions

DhanshreeA commented 5 months ago

@sucksido done!

sucksido commented 5 months ago

Thanks @DhanshreeA

sucksido commented 5 months ago

Here are the different scenarios and cost based on container sizes, @GemmaTuron @miquelduranfrigola :

• AWS Lambda is cost-effective for low-query scenarios with 1 GB models, but can become expensive for higher query volumes and larger models.
• ECS Fargate offers a balance between cost and flexibility, making it suitable for both low and moderate query scenarios, especially for larger models.
• EKS (Kubernetes) provides more control and scalability, but may incur higher costs due to additional infrastructure management.

Suggestion:

• For Low to Moderate Query Scenarios with 1 GB Models: AWS Lambda is recommended due to its cost-effectiveness and ease of scaling.
• For Higher Memory Usage or Query Volumes: ECS Fargate is recommended, providing a balance of cost, performance, and manageability without the constraints of AWS Lambda.

And we can switch them off when not in use. Feed back welcome

miquelduranfrigola commented 5 months ago

Thanks, @sucksido this is very useful.

While I originally had in mind one deployment solution only, irrespective of the model size, it is true that we could perhaps stratify models into a few categories (e.g. small and large), and deploy them accordingly. I would discard Kubernetes.

One question, @sucksido - In ECS Fargate, how fast or slow is it to "fetch" a model? I am asking because perhaps it would make sense to fetch models based on demand.

sucksido commented 5 months ago

Hi @miquelduranfrigola ,

I agree with you about fetching models on demand, perhaps we can explore this option, I would also suggest that we set up the same model on EC2 (Fargate) and one on a Lambda and gauge which option will be more cost effective then decide from there

miquelduranfrigola commented 5 months ago

This is a good idea, @sucksido.

sucksido commented 5 months ago

Action plan:

Do a comparison for fetching models by using Lambdas vs ECS, we're going to use this model: eos3804 for testing.

We will leave dynamic fetching of the models for later.

sucksido commented 5 months ago

@GemmaTuron / @DhanshreeA Please grant me the below permissions for me to be able to create lambdas:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:GetAccountSettings",
                "lambda:CreateFunction",
                "lambda:UpdateFunctionConfiguration",
                "lambda:InvokeFunction",
                "lambda:DeleteFunction",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "iam:PassRole"
            ],
            "Resource": "*"
        }
    ]
}

DhanshreeA commented 5 months ago

Hey @sucksido quick question - why do you think "iam:PassRole" is required?

sucksido commented 5 months ago

Hi @DhanshreeA we can exclude it, im sure I can still create and invoke Lambdas without it

DhanshreeA commented 5 months ago

Hi @sucksido no it's fine, I can add those permissions, I just needed to understand if it's really needed for Lambdas.

DhanshreeA commented 5 months ago

@sucksido Done!

sucksido commented 5 months ago

HI @DhanshreeA it still seems like im missing the create permission, please assist

sucksido commented 5 months ago

I think it might be because of the "iam:PassRole" role

miquelduranfrigola commented 4 months ago

Hi, can we close this issue?

GemmaTuron commented 4 months ago

Hi,

Yes, I'll close this and we can open new ones as the need arises

ersilia-os / aws-utils

AWS ECS model deployment costs #5