cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 12 forks source link

Estimate cost of running airflow v2 #745

Closed machow closed 2 years ago

machow commented 2 years ago

Currently we run airflow v1, which provisions a kubernetes cluster with 3 instances. The new model prices by compute, storage, and memory usage. Can we get a monthly estimate for cost (e.g. how much would it have cost to run last month?)

lottspot commented 2 years ago

A "small" (0.5 vCPU, 1.875 GB mem/worker) composer v2 environment most closely models the memory demand of our current composer components, which all request 1 GB of memory or less. Upsizing to a "medium" (2 vCPU, 7.5 GB mem/worker) environment may be necessary if the CPU capacity of the "small" environment proves inadequate (some components can be seen requesting up to 1 vCPU).

The SKUs as they would apply to a "small" v2 environment:

item approximate quantity
compute CPUs 2.5
compute memory 9.4 GB
compute storage 5 GB
db storage ? (assume 10GB)
environment fee small

Based on the pricing table this utilization would model the following approximate cost for 744 hours of utilization (31 days) in the us-west2 region:

EDIT: fixed db storage cost estimate

item approximate cost
compute CPUs $100.44
compute memory $41.96
compute storage $0.75
db storage $2
environment fee $312.48
total $457.63

This is likely to be a somewhat pessimistic estimate due to the following factors:

machow commented 2 years ago

@lottspot is db storage missing a decimal? (should it be 14.88?). I think around $500 a month as a pessimistic estimate makes sense.

However, I'm starting to realize that there are two factors:

It looks like composer v2 maybe does not give us much control over how much memory workers have? We have some processes that take a fair amount of memory. I wonder if it might be better for us to upgrade to composer v1 and airflow v2... this would let us basically keep running with the same cost structure, but use a version with longer term support by gcs and nice features......

lottspot commented 2 years ago

Yeah, that DB storage estimate was wildly off.... I calculated GB/hrs instead of GB/mo, which is how it is charged. I've edited it to fix the number.

lottspot commented 2 years ago

Pessemistic estimates (assuming always running at max capacity for entire month) for all environments:

Small (scheduler: 0.5 vCPUs, 1.875 GB x 1; webserver: 0.5 vCPU, 1.875 GB x 1; workers: 0.5 vCPUs, 1.875 GB x 3)

sku quant unit cost month cost
Compute CPUs 2.5 0.135 100.44
Compute Memory 9.375 0.05625 41.85
Compute Storage 5 0.001 0.744
Database Storage 10 2 2
Small Env 1 0.42 312.48
total     457.514

Medium (scheduler: 2 vCPUs, 7.5 GB x 2; webserver: 2 vCPU, 7.5 GB x 1; workers: 2 vCPUs, 7.5 GB x 6)

sku quant unit cost month cost
Compute CPUs 18 0.972 723.168
Compute Memory 67.5 0.405 301.32
Compute Storage 45 0.009 6.696
Database Storage 10 2 2
Medium Env 1 0.66 491.04
total     1524.224

Large (scheduler: 4 vCPUs, 15 GB x 3; webserver: 2 vCPU, 7.5 GB x 1; workers: 4 vCPUs, 15 GB x 12)

sku quant unit cost month cost
Compute CPUs 64 3.456 2571.264
Compute Memory 232.5 1.395 1037.88
Compute Storage 160 0.032 23.808
Database Storage 10 2 2
Large Env 1 1.02 758.88
total     4393.832

It's worth noting that the larger the environment gets, the more pessimistic these estimates become due to the fact that they are calculated based on the maximum number of workers running for the entire month.

lottspot commented 2 years ago

Below are optimistic estimates (assuming the minimum scale of workers running at all tims) for all environment sizes. The actual cost is likely to fall somewhere in between the optimistic and the pessimistic estimates.

Small (scheduler: 0.5 vCPUs, 1.875 GB x 1; webserver: 0.5 vCPU, 1.875 GB x 1; workers: 0.5 vCPUs, 1.875 GB x 1)

sku quant unit cost month cost
Compute CPUs 1.5 0.081 60.264
Compute Memory 5.625 0.03375 25.11
Compute Storage 3 0.0006 0.4464
Database Storage 10 2 2
Small Env 1 0.42 312.48
total     400.3004

Medium (scheduler: 2 vCPUs, 7.5 GB x 2; webserver: 2 vCPU, 7.5 GB x 1; workers: 2 vCPUs, 7.5 GB x 2)

sku quant unit cost month cost
Compute CPUs 10 0.54 401.76
Compute Memory 37.5 0.225 167.4
Compute Storage 25 0.005 3.72
Database Storage 10 2 2
Medium Env 1 0.66 491.04
total     1065.92

Large (scheduler: 4 vCPUs, 15 GB x 3; webserver: 2 vCPU, 7.5 GB x 1; workers: 4 vCPUs, 15 GB x 3)

sku quant unit cost month cost
Compute CPUs 26 1.404 1044.576
Compute Memory 97.5 0.585 435.24
Compute Storage 70 0.014 10.416
Database Storage 10 2 2
Large Env 1 1.02 758.88
total     2251.1122
lottspot commented 2 years ago

I believe we can also select custom sizing for our workers, so we may be able to, for example, utilize scheduler/webserver sizes from the small environment, but select larger worker sizes.

lottspot commented 2 years ago

The decision has been made to stick with composer v1 for the airflow2 upgrade