Versioning of FOCUS normalized data over long time periods?

ahullah commented 4 months ago

Description

Hi Folks, I cant remember if we discussed this already or not, but how are we handling changes in the specification for data sets that may have multiple years of history?

For example:

If I start out using FOCUS today I would normalize all my data to the V1 spec (so far so good), however 9 months down the line FOCUS releases the V2 spec and 2 years down the line we are using the V3 spec. Bearing in mind I have been normalizing my cloud spend into a BI cube using the converters during this time what is the expected behavior whenever we release new versions of the spec? (I can see a case for storing up to 7 years of data.)

Are we expecting folks to keep the full history of all the raw/ pre-normalized source files and reprocess the whole data history to the latest spec or are we planning to introduce some version identification on each row to identify which version of the spec it complies to allowing the schema to change over time and remain valid?

Proposed approach

This suggestion is to include an new REQUIRED / NOT NULL column in the spec to include a version number / identifier to indicate which version of the spec this billing line was generated against.

This would allow folks to adopt newer versions of the spec without needing to retain all the raw / source data files in order to reprocess their whole history into the latest version of the spec.

Github issue or Reference

Spec-wide issue

Context

silvexis commented 4 months ago

I would not specify the version in the actual file. Rather, I would place the version in a manifest or schema document that would accompany a data drop from a provider.

flanakin commented 4 months ago

There's value in adding a version column, given you'll have historical data that you need to know the version for when it's queried possibly years later. While we can easily add this at any point, I would personally like to see this added for 1.0 to make it easier to handle the differences in 1.0-preview and 1.0. The longer we wait, the harder it makes dealing with mixed versions.

macko76 commented 3 months ago

Would it make sense to consider alignment with versioning standards, such as SemVer and introduce minor, non-breaking versus major breaking changes accompanied by mapping rules?

jpradocueva commented 3 months ago

I agree with @macko76. I suggest following up on the guidelines for Semantic Versioning. The official release of version 1.0 would only occur after the final approval by the working group and its ratification by the Steering Committee. Subsequently, the team will proceed with developing the next release version, typically denoted as v1.1. This implies that the modifications in the release v1.1 will maintain backward compatibility with release v1.0. However, in the scenario where the group decides a change that lacks backward compatibility, the release should be labeled as v2.0.

jpradocueva commented 1 month ago

This issue was marked a P1 by TF-1 on May 21.

AWS-ZachErdman commented 1 month ago

I support putting the version number in the manifest for each delivery.

If appending different versions of data, a practitioner should be able to add the column to show this, but generally we wouldn't recommend appending different versions before conversion to the same spec version.

jpradocueva commented 1 month ago

Classified as Version Lifecycle by the Maintainers on the May 24 call.

jpradocueva commented 6 days ago

Document moved to #397

FinOps-Open-Cost-and-Usage-Spec / FOCUS_Spec