FinOps-Open-Cost-and-Usage-Spec / FOCUS_Spec

The Unifying Specification for Cloud Billing Data
https://focus.finops.org
Other
175 stars 39 forks source link

[Work_Item] (BUG) Ensure all user-defined Tags are encapsulated within the `Tags` column #540

Open cnharris10 opened 1 month ago

cnharris10 commented 1 month ago

1. Problem Statement *

What is the problem?: Explain the context and why it needs resolution. Impact: Describe how the problem affects users, systems, or the project.

In a recent discussion with @AWS-ZachErdman, he mentioned an oversight within the Tags column that at least affects providers with 2+ user-defined tags systems. In this case, AWS (user-defined resource tags, cost categories) and GCP (tags, labels) are affected.

The Tags column currently says: Providers MUST NOT alter user-defined Tag keys or values. In cases where a provider has multiple user-defined tagging features that allow for the same user-defined tags to be created, but partitioned by feature, this will require at least N-1 user-defined features to require some prefix in order to prevent clobbering.

For example, AWS has user-defined both resource tags and user-defined cost categories. If a customer defines a user-defined resource tag as foo:bar and a cost category as foo:baz, then persisting both in the Tags column key/value map will cause clobbering (i.e. either "bar" or "baz" will persist, not both). The same case can occur between GCP tags and labels.

2. Objective *

State the objective of this work item. What outcome is expected? Success Criteria: Define how success will be measured (e.g. metrics and KPIs).

All user-based or provider-based tags are encapsulated within the Tags column with predefined prefixes preventing clobbering for at least N-1 tagging schemes.

3. Supporting Documentation *

Include links to supporting documents such as:

  • Data Examples: [Link to data or relevant files; DO NOT share proprietary information]
  • Related Use Cases or Discussion Documents: [Link to discussion]
  • PRs or Other References: [Link to relevant references]

Original Tags column definition for FOCUS 1.0: https://github.com/FinOps-Open-Cost-and-Usage-Spec/FOCUS_Spec/pull/227 Use Case: Analyze cost and usage by multiple tag structures without guessing which columns contain various tags

4. Proposed Solution / Approach

Outline any proposed solutions, approaches, or potential paths forward. Do not submit detailed solutions; please keep suggestions high-level.

Initial Ideas: Describe potential solution paths, tools, or technologies. Considerations: Include any constraints, dependencies, or risks. Feasibility: Include any information that helps quantify feasibility, such as perceived level of effort to augment the spec, or existing fields in current data generator exports. Benchmarks: Are there established best practices for solving this problem available to practitioners today (e.g. mappings from existing CSP exports that are widely used)?

In the proposed approach, using the AWS CUR as an example, the following tags are considered:

User-defined Tags:

Provider-defined Tag:

The proposal is to amend the Tags column to allow a user-defined prefix to be concatenated with a finalized user-defined tag key for N-1 user-defined tagging schemes. This allows for 1 tagging scheme to remain without a user-defined prefix, so practitioners can reference a user-defined tagging schema without a prefix.

With the tags supplied above, all Tags can be co-located as either:

Option 1: Predefined prefix declared for N-1 user-defined and all provider tags Provider declares prefix: costCategories for user-defined cost category tags and aws for provider-defined system tags.

Tags: { "foo": "bar", "aws:foo": "bar2", "costCategories:foo": "bar3" }

Option 2: Prefix declared for all user-defined and all provider tags Provider declares prefix user for user-defined resource tags, prefix: costCategories for user-defined cost category tags, and aws for provider-defined system tags.

Tags: { "user:foo": "bar", "aws:foo": "bar2", "costCategories:foo": "bar3" }

5. Epic or Theme Association

This section will be completed by the Maintainers.

Epic: [Epic Name] Theme: [Theme Name, if applicable]

TBD

6. Stakeholders *

List the main stakeholders for this issue.

Primary Stakeholders: [Name/Role] Other Involved Parties: [Names/Roles]

TBD

jpradocueva commented 1 week ago

Summary TF-2 call on Oct 16:

#540 [DISCUSSION]: Tags Column Definition and User-Defined Tags Key Discussion Items: The discussion focused on the requirement that user-defined tags cannot be altered, which could lead to issues with normalization and denormalization. Problem Identification: Some cloud providers (AWS, specifically) have multiple tag schemes, leading to complications in enforcing a strict tag policy. Divergent Views: The group debated whether changing the definition would introduce a breaking change. Final Agreement: Chris will create a work item to formalize the issue, advocating for a potential change in the 1.2 release. Action Items: [TF-2-#540] Chris, @cnharris10 will handle creating the work item for this issue.

shawnalpay commented 1 week ago

@cnharris10 Spent some time with this one. I get it now! But I have some feedback. :)

If you feel this level of detail is unnecessary and/or I'm being pedantic, I can appreciate that -- but our audience for these issues is expanding beyond the FOCUS project team, and any/all context will be helpful for someone getting up to speed on this (even a dense Maintainer such as myself!).

jpradocueva commented 1 week ago

Summary from Members' call on Oct 17:

#540 [DISCUSSION]: Tags Column Definition Mandates that User-Defined Tags are Not Altered, Which Can Lead to Various Scenarios Primary Issue: This discussion revolves around the current requirement in the specification that user-defined tags must not be altered. The concern is that this rule could lead to complications when practitioners deal with multiple user-defined tag schemes from different providers. Core Problem: AWS, for instance, allows both user-defined resource tags and user-defined cost categories, which could result in conflicts when both types of tags share the same key names. The current specification does not adequately address how to differentiate these multiple tag schemes without altering the user-defined tags. Divergent Views: Some members felt that allowing providers to prepend a prefix to user-defined tags could resolve the issue without altering the tags themselves, while others expressed concern that introducing prefixes would increase complexity and make tag management harder for practitioners. There was also debate about whether this could be considered a breaking change. Final Agreement: The group agreed to explore solutions that would allow providers to prepend a prefix for certain user-defined tag schemes (e.g., cost categories) without altering other user-defined tags. This Issue #540 represents the first “work item” to prepare. However, this should be carefully reviewed to ensure that it doesn’t introduce complexity or conflicts for practitioners. This Issue #540 represents the first “work item” to be prepared by the group. Action Items:

thecloudman commented 3 days ago

I use GCP and Azure and in GCP we have labels and tags, in both labels and tags we have some matching keys. In our FOCUS dataset we dont have any issues in showing the key values from both labels and tags. Might need to do some more investigation into this one.

shawnalpay commented 2 days ago

@thecloudman Interesting; thanks for sharing.

@cnharris10 @AWS-ZachErdman Do we have real-world examples of this happening, and if so, could you share? It may be difficult to get the stakeholders to prioritize this one if it's not perceived to be a problem.

cnharris10 commented 2 days ago

@shawnalpay

image
cnharris10 commented 2 days ago

@thecloudman

A couple questions:

  1. For GCP exports, are you saying that when you have a tag, foo:bar, and a label: foo:baz, you are fine with (non-deterministically) the Tags column manifesting as either {"foo": "bar"} or {"foo": "baz"} and losing the other entry?

  2. To mitigate this clobbering issue, AWS supplies user-defined resource tags within the Tags column and also creates a provider-defined column, AWS_CostCategories, that encapsulates their other user-defined (Cost Category) tags. This ensures that the example from the previous question doesn't occur.

If providers follow this approach, then providers will encapsulate some user-defined tags under the standard Tags column and the rest under 1 or more provider-based columns (ex: x_MyOtherTags). In this case, with 3 hypothetical providers going this route (Provider1, Provider2, Provider3), 4 columns will be produced causing practitioners to look/query across various, non-normalized columns for user-defined tags.

Example:

The intent of the Tags column for 1.0 was to encapsulate all tags under one column to allow an easy querying experience regardless of provider

rileyjenk commented 2 days ago

I just tested this with our own data and there is potential collision if their a multiple mechanisms that are resulting in keys that are the same. In the event that the provider has a multiple systems that provide key and values in the tags column then they either need to:

udam-f2 commented 2 days ago

I also see the need for this.

The spec allowing for namespacing to avoid these collisions seems like the preferable approach here.

ijurica commented 14 hours ago

An oversight in the specification, and we need to resolve it.

AWS-ZachErdman commented 13 hours ago

@cnharris10 this is mainly a problem with respect to cost categories having it's own column and should not be related to the gap that we listed in our user guide for our preview specification.

The most compelling problem explanation and argument for me about why we should reconsider this definition is the argument you gave here:

If providers follow this approach, then providers will encapsulate some user-defined tags under the standard Tags column and the rest under 1 or more provider-based columns (ex: x_MyOtherTags). In this case, with 3 hypothetical providers going this route (Provider1, Provider2, Provider3), 4 columns will be produced causing practitioners to look/query across various, non-normalized columns for user-defined tags.

Example:

Tags: { "foo": "bar" } x_Provider1_OtherUserDefinedTags: { "foo": "bar2" } x_Provider2_OtherUserDefinedTags: { "foo": "bar3" } x_Provider3_OtherUserDefinedTags: { "foo": "bar4" } The intent of the Tags column for 1.0 was to encapsulate all tags under one column to allow an easy querying experience regardless of provider