Kuadrant / kuadrant-operator

The Operator to install and manage the lifecycle of the Kuadrant components deployments.
Apache License 2.0
37 stars 33 forks source link

Proposal: `v0.x` versioning #32

Closed alexsnaps closed 2 years ago

alexsnaps commented 2 years ago

Proposed version scheme: v0.<yW>.<dot> where yW is from date +%y%W i.e. the year & week number, e.g. 2232. The version would be aligned across all components of Kuadrant, and we'd aim for a new release at the end of every sprint. This is a mix of semantic versioning and date based versioning. The former is dictated by some of our own reliances (e.g. operator hub), while the latter removes the necessity of mapping an actual number to a sprint.

The idea of aligning versions is to keep things simple and easy for us to manage for as long as we can, while shipping a functional packaged Kuadrant.

At the end of a sprint, we'd release all components with all of them using the same version (e.g. v0.2232.0). Each component would define its default dependency to other Kuadrant component to use v0.2232, where any <dot> version would work - leaving us room for further tiny fixes if need be (more on that later). With such an approach, the release process and the team don't need to account for breaking changes to inter-component dependencies. Today, e.g. today limitador- & authorino-operators both default to latest (with some difference on how they actually do it) for the image they'll use to deploy their respective service onto the cluster. In this new world, they'd default to deploying version v0.2232 of their service instead, with the guarantee that this combination works. The option of overriding this should remain and users could "mix and match" as they see fit, but without guarantees that this actually works.

The idea is to give us the greatest flexibility in changing interfaces of each component without requiring us to think about implications for other versions of other components. Using the example above, if a new image of limitador breaks something that breaking change would be imposed on everyone. Until we reach v1, we should be able (if not encourage) to break things for the "benefit of the great good", where that would be the Kuadrant project as a whole. Once that point has been reached, any v1.m.d release of any component would work with a "Kuadrant v1" deployment. But until then, why tie ourselves to these limits? Especially as we are still exploring the problem space (and have some other reality imposed on us, not necessarily fixed neither, i.e. alpha APIs by some of our external dependencies), in a way that dependents can inform API changes in their dependencies.

A dot release might still be required in the case of a particular vulnerability or having to fix an actual bug that slipped through. But also only if some actual user cannot possibly upgrade to the latest version, because of time or breaking changes. Then a fix would be either provided to main and cherry-picked for a dot-release; or go straight to the branch of the tag of the affected version (depending on urgency and/or difficulty). A dot release could also be "hi-jacked" to quickly release a feature to someone. i.e. dot releases still leave every component to deal with their "own fate".

We'd still thrive to give fixes priority, so that hopefully people can "simply" upgrade to the release coming up at the end of sprint.

Content of a release

Requirements

Downside

We might end up doing "empty" releases, i.e. ones in which nothing actually changed from the previous one. While this might be somewhat confusing at first, there isn't a major drawback if most of the releasing process is automated/lightweight.

Tripwires

This is by no mean a release proposal for us to use for ever. The idea is to streamline development of the whole Kuadrant stack, without having to suffer from the additional complexity of having multiple versions and needing to clearly define the dependency tree. At the very lastest when the v1 (i.e. something we believe to be stable, while might be partially informed by the Gateway API's timeline), we probably would want to cut free from that scheme. But there might be other reasons:

If any of the above happens, we should revisit our decision at that time. But there might be other reasons to revise it as the development journey of the different components continue.

Other benefits

This forces us all to think more in the Kuadrant stack. Hopefully with the whole CI/CD pipeline, this will have us all "learn" more about the implications of the dependencies and interactions of our components, where the "only" requirement is to get things working again by the end of the sprint.

There is a virtuous circle aspect to it all where the process will consolidate the product itself as well as the team, with being more "full stack"-minded.

Finally, this makes it also much simpler for potential users to keep up with, there is a single version of "Kuadrant" exposed to them that they don't need to understand the underpinnings of, other than if they need or want to.

guicassolato commented 2 years ago

I think it’s fair to start by saying that I have strong feelings against this proposal.

I think the proposal is flawed in conception, starting by the assumption that it is "simple". It almost treats Kuadrant as if it were a monolith project, which it is not.

Each component of Kuadrant has its own history and can be seen as a fairly independent open source project. They already have public releases (or tags in the case of Limitador), change logs and version numbers that mean something. From the perspective of the people involved and watching those projects individually, the proposed change comes as an externality, without necessairly having a good reason to be.

Not that projects aren't allowed to swicth to a different versioning system at any point in time, but I'd expect a very good reason for that to happen. Following a much bigger convention, to name one example. But coming from an alleged need stated at the level of another component (Kuadrant operator), and especially as something meant to be temporary (if I understood correctly), not only creates a lapse in the independent timeline and versioning history of each component, but also forces people to look elsewhere to understand what's going on.

As far as I'm concerned, the system adopted today by Authorino, Limitador and their operators of semantical versioning and incremental changes in the version numbers follow a standard that is universally accepted and compatible with what dictates reliance acknowledged in the proposal, like OperatorHub, and technologies we work with (e.g. Glolang module version numbers).

Switching to a new versioning system is confusing and unnecessary.

this makes it also much simpler for potential users to keep up with, there is a single version of "Kuadrant" exposed to them that they don't need to understand the underpinnings of

Kuadrant can still have a "single version" without changing a thing in the versioning of the components it depends on. For one who uses Kuadrant, the versions of the inner components Kuadrant installs and uses to achieve its purpose can be/should be completely transparent, whether we call those inner versions X or Y.

the release process and the team don't need to account for breaking changes to inter-component dependencies

I don't see how that can be the case. Say I you have Limitador and Kuadrant both at v0.X; before a coordinated release v0.Y happens, Kuadrant needs to know what changes were merged in Limitador to avoid breaking Kuadrant v0.Y – since both will become v0.Y and are, by definition, mandated to be compatible with each other. So Kuadrant needs to keep track of occasional breaking changes in Limitador either way, regardless of the names given to the versions and releases.

Maybe there's an assumption here that such breaking changes could be avoided or coordinated better while they are happening within the same sprint. However, that doesn't mean the team "don't need to account for breaking changes". Changes made in one component can still affect the other we require coordination between the team (potentially teams, plural).

I think coordinating is always needed. Worse than coordinating would be having to postpone an important (yet breaking) change in Limitador to, say, v0.Z, where X < Y < Z, because, let's say, Kuadrant core could not be fixed for Limitador's change in time for v0.Y. In a scenario where the only thing that changed in Limitador during that last sprint targeting the v0.Y milestone was that breaking change postponed to v0.Z (or the "empty release" scenario, as called), we'de forcing Limitador to bump from v0.X to v0.Y without any actual work shipped, while saving the important change only for v0.Z, and possibly making people only interested in Limitador to wait another release, another sprint.

This leads me to another missconception, the equivalence between sprints and releases. These things are not the same thing. Sprints are a way to control, in a time frame, units of work we can deliver; releases are not for units of work, but for features, bug fixes, etc. Those two things are related but not equivalent. We should release when we accumulate enough features, when bug fixes and enhancements are ready, including all steps of the process (i.e. testing, docs, etc). We should not release when the clock ticks.

Developers should be able to merge as soon as their units of work are done; some aspects of the bigger tasks they (and testers and doc writers) are all working on will be done in one sprint, others maybe in the next.

We might end up doing "empty" releases, i.e. ones in which nothing actually changed from the previous one.

I believe there is no good reason that justifies an "empty" release. We can have as many tags as we want linked to a release. In fact, that should be enough to automate compatibility matrix and avoid empty releases.

Today, e.g. today limitador- & authorino-operators both default to latest

At least regarding the Authorino Operator, this is incorrect. Authorino Operator latest defaults to Authorino (operand) latest; released (fixed) versions of Authorino Operator defaults to a compatible also released (fixed) version of Authorino. For example, Authorino Operator v0.3 is known to be compatible and defaults to Authorino v0.9: https://github.com/Kuadrant/authorino-operator/blob/c6f127b664f74e0a0e88991d27f14637d4395cb6/api/v1beta1/authorino_types.go#L65

The above also means that we, at some level, already keep a compatibility matrix of components of Kuadrant today.

Keeping a compatibility matrix between dependencies is not a big deal, really. Actually, we do that at much large scale already, if we include as well the dependencies to other non-Kuadrant components. We test, document and release whilst keeping in mind the versions of Kubernetes, of Istio, of libraries and of pretty much any piece of code we interact with that is hosted at a different git repo.


My final perception about a single version number for all components and automatic releases by the end of every sprint is that it sounds like trying to fix a problem we barely have, by not fixing it at all, yet causing as much confusion as the simplicity it claims.

I hope my POV doesn't sound harsh because that is really not my intention. I actually very much appreciate the discussion. I just happen to believe this is not something we want, at the point we are, with the problems we have.

If I had to pick one fight, I'd say the automatic releases at every sprint is definitely what bothers me the most, but I think that without that the whole thing of single, date-based, version numbers for all components falls apart and loses a bit of purpose. Perhaps this should be an indicator that we might be picking the wrong solution for the problem we want to tackle.

maleck13 commented 2 years ago

So there is a lot to digest here. So I am going to add my own wall of text :( with some of my own thoughts and we can discuss further and get to the right balance and solution

There seems to be two main areas of discussion 1) Sprint based releases 2) The versioning system we want to use and aligning those versions

Sprint Based Release (read regular release cadence) If we think of a sprint as a set period of time in which we commit to completing some planned work (docs and testing included), then we should have an increment of kuadrant that has value and is releasable to the community / end user at the end of that time. If we don't have something worth releasing, then we likely did not plan well and we can make a decision at that point not to release. Release testing and bug fixing (ideally as automated as possible) should factor into our planning (IE we need x amount of time in a sprint to prepare the release and have goal of reducing that time through investing in automation). Again as @alexsnaps mentioned this helps us think about Kuadrant as single stack that is intended to work as a unit but also makes the latest features and improvements available rapidly. During this time component owners should pull together to get the release out. A feature being ready for release includes required docs updates (engineering docs) and any needed tests in place (as most of these things live in the repos they can be done in a single PR), Whether we need new docs or changes in other components, new e2e tests etc should be decided when defining and planning the feature and when we consider it to be done. Having a default of releasing at the end of a sprint also means we have relatively small change sets which helps with pinning down issues. That said it does not mean it is the only time we can release. As @alexsnaps pointed out a critical patch is a good example of something that would be released as soon as it was ready.

Versioning

We need to be able to release Kuadrant as single unit and for that to have a single exposed version that expresses a tested and blessed unit. I agree with @guicassolato that the dependant versions are and should be transparent. When I have done this in the past, it has been a single top level semantic version that expresses a dependency on several other components that have their own semantic version (often the operator version). So the "top level" kuadrant version 0.4.0 pulls in the dependant specific versions and that is the "blessed" configuration. So it is in planning where this coordination can happen for the regular releases and then adhoc for a patch (as in discuss and plan for a critical patch release). So if a sprint includes work for limitador then part of that work being considered done includes making it available to Kuadrant (creating a new release of limitador, updating the limitador operator releasing it and then finally creating a PR to the kuadrant operator to use that new version. I think this is the complexity @alexsnaps was referring to, that a single version would help as it is known what the version will be even if nothing changed and if there is a patch needed to a component then we likely also need to release a patch of Kuadrant to pick up the patch. The kuadrant operator is the natural gatekeeper of the version information. An example of one way to do this https://github.com/integr8ly/integreatly-operator/blob/master/products/products.yaml
I am straying into CI/CD, but obviously when we create PR to update the version of limitador operator used by the kuadrant operator a set of tests should execute. We should also have a nightly run that installs from master and finally with each release candidate (tags on the kuadrant operator) the automated tests would execute again. So I think we could still use semantic versioning for each component. IE each sprint release could increment the minor version of Kuadrant , independent patches would increment the patch version and when our core APIs move to v1 we also move to 1.x.x as the released version.
Where this scheme causes some complexity, which is what triggered the same version idea, is the concept of grouping work under a milestone in a GH project and seeing the work on a single board.

Breaking Changes

If there are breaking changes in a component API used by Kuadrant (example the ratelimit API or AuthConfig ), these ideally will be seen in planning and an issue to update kuadrant's APIs to work with the new APIs added (different matter if we are at v1 as now we cannot break the API without moving major version). If not spotted in planning, our tests (manual or automated) should catch these and so a PR to update to the latest limitador operator would not be merged as the tests should fail, if our tests don't catch it then as with any software we just released a bug and it will need a fix. I think this is what "imposed on everyone" means. IE the breaking change can be made without thinking about the overall version of kuadrant having to bump a major version, our API version covers these types of changes but the changes are "imposed on the kuadrant controller and the end user" @alexsnaps put me right if I have misunderstood.

alexsnaps commented 2 years ago

Well… not much matters as we'll be releasing limitador-server v1.0.0 soon…