FinOps-Open-Cost-and-Usage-Spec / FOCUS_Spec

The Unifying Specification for Cloud Billing Data
https://focus.finops.org
Other
160 stars 35 forks source link

SkuCapacity / SkuCapacityUnit #320

Open marc-perreaut opened 6 months ago

marc-perreaut commented 6 months ago

Description

It's interesting to have a high-level capacity or "footprint" view for discussions with leaders and for cost analysis. The capacity metric is for example VCPU for VMs, TB for storage, GB/s for network (with split inbound/outbound when makes sense), whatever the SKU. Here are examples:

  • How many CPUs / TBs do we have overall / in this region / in this application? What the regions / applications with most CPUs / TBs ?
  • Can we have the capacity details by VM series/family?
  • What's the cost impact if we switch from this VM series/family to that VM series/family?
  • How much % of the CPUs (capacity) have we committed?

Proposed approach

An approach is to add columns SkuCapacity and SkuCapacityUnit to provide the capacity information, which can be seen as a high-level attribute of the SKU. The SkuCapacityUnit should be standardized (whitelist of allowed values), so that SKU capacity can be aggregated and reported cross SKUs and cross providers, by SkuCapacityUnit.

Github issue or Reference

I am not aware of any existing Github issue related to this topic.

Context

Capacity information can be calculated by parsing SKU information, which is provider specific. It would bring value to get the capacity information natively available for practitioners. Note: practitioners would need to take into account the SkuCapacityUnit to calculate time-averaged capacity.

sireeshaoram commented 6 months ago

By focusing on capacity metrics like vCPUs, TB, and GB/s, organizations can make informed decisions about scaling resources up or down based on the demands of their applications. This contributes to performance optimization and cost-efficiency.

hrishikeshsardar commented 6 months ago

SkuCapacity/SkuCapacityUnit are definitely important for cross referencing configurations and the cost among the cloud service providers. This will help practitioners or consultants in recommending best suited solution for the project/application.

Happy to provide more context in upcoming meetings.

ahullah commented 6 months ago

Would you think of this as a core part of the data set, or as a reference table that sits alongside the main data set for dynamic enrichment?

hrishikeshsardar commented 6 months ago

@ahullah, adding columns SkuCapacity and SkuCapacityUnit to the core data set (detailed spec) makes sense for me as a practitioner. It helps in many scenarios like analyzing on-demand vs commitment to derive scope further.

These two columns alone might not be enough to do the analysis, it require other columns related that identifies charge type alongside quantity billed on-demand vs commitment.

flanakin commented 5 months ago

Aren't there scenarios where there would be multiple types of capacity? We've previously talked about adding SkuDetails column that would be JSON and could have whatever SKU-specific attributes are needed.

marc-perreaut commented 5 months ago

Aren't there scenarios where there would be multiple types of capacity? We've previously talked about adding SkuDetails column that would be JSON and could have whatever SKU-specific attributes are needed.

Probably, so the SkuCapacity would be the main capacity type. For example, a VM has a capacity in CPU and in memory: the main capacity would be the CPU. One can argue that the main capacity is memory for memory-bound workloads, for example databases. If happens, both CPU and memory would be needed as capacity: new columns SkuCapacity2 and SkuCapacity2Unit could be added, so that both CPU capacity and memory capacity are present in the dataset. Or maybe something smarter?

A SkuDetails JSON column could do the job, if the keys SkuCapacity and SkuCapacityUnit are always present whatever the SKU (as the goal is to have an aggregated, high-level capacity view), but I find it less convenient than dedicated columns from a practitioner perspective, as it implies to decode the JSON.

sireeshaoram commented 5 months ago

Consider a hybrid approach would be a smarter approach here to Use separate columns for primary capacity (CPU) and secondary capacity (memory). If both capacities are needed, populate the relevant columns. If only one capacity is relevant (e.g., memory-bound workload), leave the other column empty. This way, we can maintain clarity while accommodating different scenarios