aws-solutions-library-samples / aws-batch-arch-for-protein-folding

Apache License 2.0
73 stars 27 forks source link

Estimated costs #5

Closed stevemmarshall closed 1 year ago

stevemmarshall commented 1 year ago

Hi Brian

These estimated costs seem high. Is it because of the NAT charges? Is that downloading new databases whenever a job is run? Why not use a gateway VPC endpoint?

Thanks Steve

brianloyal commented 1 year ago

Hey Steve, thanks for the question. Like you called out, the two options for dealing with data transfer for AWS Batch are NAT Gateways and VPC endpoints. The choice of one or the other depends a lot on the situation. In this case, because each Batch run needs to communicate with multiple services (CloudWatch, ECR, FSx), the cost of the NAT Gateway comes a little out ahead of the multiple endpoints. The biggest contributor to the total per-run cost is the cross-AZ data transfer charges associated with the MSA jobs. The reference data used for MSA creation is stored in a single FSx for Lustre instance in subnet A. If you specify the multi-AZ option when you provision the stack, your Batch jobs will run in subnet A, but also in two additional subnets in different AZs. This means that 2/3 of your jobs will require several hundred-GB of data transfer between the AZs, which adds cost. The best way to minimize this is to select the single-AZ option when you provision your stack.

The added cost (and time) of the MSA step is another point in favor of LLM-based protein folding algorithms like ESMFold. We'll see how this area of research advances in the future!

LanceWesterhoff commented 6 months ago

@brianloyal - is there a way to modify this MultiAZ setting AFTER you have already created the cloud? Thanks!