HTTPArchive / data-pipeline

The new HTTP Archive data pipeline built entirely on GCP
Apache License 2.0
5 stars 0 forks source link

Optimize GCP costs #27

Open rviscomi opened 2 years ago

rviscomi commented 2 years ago

Owner: @giancarloaf Supporters: @rviscomi @pmeenan

We need to do a deeper accounting of our GCP costs and have a better understanding of what the new pipeline will cost at full capacity. With that information, we should also identify practical ways to reduce costs and start implementing some of the low-hanging fruit.

For example, we have 1.3 petabytes in the nearline class of Cloud Storage. If we switch to the archive class we could save 88% ($10k/month).

rviscomi commented 2 years ago

Update: @pmeenan adjusted the storage class on GCS today, would be good to see how that affects costs in the billing dashboard.