Open ycombinator opened 2 months ago
cc: @pkoutsovasilis @swiatekm @blakerouse @cmacknz
As I see it, we have two basic choices here:
The important questions that should guide this choice are:
I don't feel like I know enough about the domain to have informed opinions about these myself. I think I'd default to option 1 just because it's simple, but it does sacrifice some flexibility.
I like in general the first choice that @swiatekm proposes above, but I would like to add a twist to it
Before elaborating more I will try to provide some answer to the raised questions
- How often do we expect to have semantic changes to the Chart, not related to the agent version?
Changes to the Helm chart can be driven by the following:
- Are we ok with the user having to set the agent image?
I would prefer to have as default one the one that the helm chart was tested against
- Is it important for users to be able to use the Chart with agent versions older than the first Chart release?
imo the helm chart should support the user specifying different agent versions/images but provide guarantees of nominal operation only with the default image that comes with it. Hence, following this guarantee, there will be chart versions that will support only a specific agent version and either the user needs to stay in an older version if they want to remain under an agent version or change with their own responsibility
- Is it a pain to release the Chart in lockstep with agent?
I wouldn't expect this to be more painless that a lockfree release with agent (last famous words of me)
My approach would be that the Helm chart releases can happen only from the main
branch. Specifically we can fabricate a versions.json
file inside the helm chart that captures all the required release info, namely chart_version
, agent_version
, etc. Then we can add the following CI steps:
versions.json
inside the helm chartThe above is just a draft of an my initial thinking more than happy to drill down the details if it sounds reasonable to all the interested parties
@pkoutsovasilis do you mean literally just main
, or also the release branches? If we don't do the latter, we won't have Helm Chart releases for agent patch versions.
@swiatekm I was thinking main
but sure this can be adjusted to any branch if we deem such a wiser choice. The way I am thinking of it
imo the helm chart should support the user specifying different agent versions/images but provide guarantees of nominal operation only with the default image that comes with it
the helm chart will support agent versions incrementally thus when there is a patch version to the latest agent release I see no issue creating a PR that points the aforementioned version.json
to the patch version and triggering all the necessary CI steps. But maybe I am missing something, so please do elaborate π
Changes to the built-in kubernetes integration as it mostly reflects this package
How do we account for and include changes to the package? Automation that creates an appropriate PR in this repository that eventually leads to a new chart release?
I like the idea of just releasing from main as it is simpler, but I worry we'd need a way to indicate which agent versions new features are compatible with eventually like the stack version constraint in integrations packages.
It may be simpler to have a release of the helm chart for each active minor to eliminate this problem. If we want to account for package changes, we'd need a versioning scheme that can move faster than the stack release. We couldn't just have 8.16.1, 8.16.2, etc.
How do we account for and include changes to the package? Automation that creates an appropriate PR in this repository that eventually leads to a new chart release?
at the moment this is a manual process so the accounting happens manually. That said, if we deem that this becomes "unbearable" we could try and automate it; ps: transitioning from handlebars to helm templates is non-trivial π One of the reasons this Helm chart started with only the kubernetes integration built-in is this one
I like the idea of just releasing from main as it is simpler, but I worry we'd need a way to indicate which agent versions new features are compatible with eventually like the stack version constraint in integrations packages. It may be simpler to have a release of the helm chart for each active minor to eliminate this problem. If we want to account for package changes, we'd need a versioning scheme that can move faster than the stack release. We couldn't just have 8.16.1, 8.16.2, etc.
In my thinking the helm chart provides guarantees only for the combination of agent_version
and kubernetes_integration_version
it comes with when it is released. In simpler terms, if the user wants to mix match different agent versions and the kubernetes integration, it is up to them to do so and they should follow the route of custom integration and not rely in the built-in one. Because the latter is guaranteed to work correctly only with the agent version that the released chart comes with. However, if you are not in favour of this, then I guess we need to alter this approach.
In my thinking the helm chart provides guarantees only for the combination of agent_version and kubernetes_integration_version it comes with when it is released.
I think requiring an agent upgrade to get a guarantee for a new kubernetes integration version goes against the way we want package releases to work where you don't have to upgrade agent outside of this scenario.
Often users cannot rapidly upgrade their Elastic Agent (because they have to upgrade ES, they have rigid compliance or QA procedures, etc) but they can upgrade their agent configuration or policy quickly.
Whatever we decide to do has to support upgrading the k8s integration without upgrading the agent version, I think that will be quite a common ask.
Users of this chart will frequently be k8s and Helm users but not k8s and Helm (or Elastic Agent) experts, we need to optimize for helping them get fixes with minimal toil or the need to escalate to engineering via support cases.
I think requiring an agent upgrade to get a guarantee for a new kubernetes integration version goes against the way we want package releases to work where you don't have to upgrade agent outside of this scenario.
Often users cannot rapidly upgrade their Elastic Agent (because they have to upgrade ES, they have rigid compliance or QA procedures, etc) but they can upgrade their agent configuration or policy quickly.
Whatever we decide to do has to support upgrading the k8s integration without upgrading the agent version, I think that will be quite a common ask.
Users of this chart will frequently be k8s and Helm users but not k8s and Helm (or Elastic Agent) experts, we need to optimize for helping them get fixes with minimal toil or the need to escalate to engineering via support cases.
ok I hear you π Then let's do a minor alternation to the approach;
helm chart releases happen from 8.x, 9.x branches, still following the same versions.json
paradigm with CI steps, I mentioned above, but each one of these branches target for release a different minor version of the helm chart. As an example 8.16
branch of the agent would be the 0.1.1
helm chart version. Following this pattern, the 9.0
agent branch would be the 0.2.1
helm chart, etc. (speaking random semvers here π
). I think this approach covers the cases you mention @cmacknz , right?
Yes that would work and let us backport fixes where applicable.
ok then my next step will be to draw this "flow" in a diagram and thus help all the interested parties to have a better understanding how this is gonna look like. Then if there are no strong objections I will proceed to coding it
I've experimented with a local MinIO as a Helm registry. Here are my findings:
<major>.<minor>.<patch>-<alphanumeric>
are treated as pre-releases and require the --devel
flag when searching the Helm repo. As a result, using -SNAPSHOT
, similar to the agent version, is enough for a dev/beta Helm chart.To keep the Helm chart aligned with a specific agent version, the chart version should remain "connected" to the agent version. Using identical versions between the Helm chart and the agent, with a build number <major>.<minor>.<patch>+<build_number>
for hotfixes, doesn't work as Helm treats versions with different build numbers as the same. This can result in random chart versions being installed. Therefore, itβs best to sync the <major>.<minor>
parts between the Helm chart and the agent, while keeping <patch>
unlocked. This should be fine as Chart.yaml captures both the chart and app versions. As a result, the app version can be 1:1 mapped with the agent version to avoid any confusion. For example if we publish a Helm chart with the following Chart.yaml:
apiVersion: v2
name: elastic-agent
description: Elastic-Agent Helm Chart
kubeVersion: ">= 1.27.0-0"
type: application
appVersion: 8.16.0 # elastic-agent version
version: 8.16.10 # chart version
results in the following:
$ helm search repo elastic-agent
NAME CHART VERSION APP VERSION DESCRIPTION
minio-helm/elastic-agent 8.16.10 8.16.0 Elastic-Agent Helm Chart
In the above example, the chart version is 8.16.10 and the app version is 8.16.0.
Based on the above observations, this my proposed flow; each branch, namely 8.16
, 8.x
, and main
, will feature a Helm chart with different chart and app versions in the respective Chart.yaml. The release process for each one will look like the following:
flowchart TD
classDef default line-height:1.5,text-align:center;
B[Manual: Bump chart version and set app version<br>in <a href='https://github.com/elastic/elastic-agent/blob/main/deploy/helm/elastic-agent/Chart.yaml'>Chart.yaml</a>]
D[CI: Verify that agent <a href='https://github.com/elastic/elastic-agent/blob/main/version/version.go'>version</a> matches<br>the major.minor parts of the chart version<br>and fully the app version in Chart.yaml]
E["CI: Ensure that chart version<br>is pre-release (-SNAPSHOT) if no git tag exists"]
F[CI: Ensure that a chart with the same version<br>is not released]
H[CI: Lint and package Helm chart]
I[CI: Download index.yaml from<br>the elastic Helm registry and<br>add the new chart entry]
L[CI: Push Helm chart package<br>to repository]
K[CI: Push updated index.yaml<br>to repository]
B -.->|Pull Request with Chart.yaml changes| D
D --> E
E --> F
F --> H
H -.->|"''release'' comment under the PR (maybe PR merged?!)"| I
I --> L
L --> K
If the above proposal gets accepted, we need to figure out the following:
Things to consider:
cc @cmacknz @swiatekm @ycombinator
@pkoutsovasilis in your example, you have:
appVersion: 8.16.0 # elastic-agent version
version: 8.16.10 # chart version
Does that mean that you want to always align the minor version of the Chart and elastic-agent, but not necessarily the patch version? If so, do you plan to release Chart patch versions without bumping the agent version?
@pkoutsovasilis in your example, you have:
appVersion: 8.16.0 # elastic-agent version version: 8.16.10 # chart version Does that mean that you want to always align the minor version of the Chart and elastic-agent, but not necessarily the patch version? If so, do you plan to release Chart patch versions without bumping the agent version?
Yes pretty much; I want to keep the major and minor parts always aligned and the patch unlocked π One of the requirements for the Helm chart release is to able to backport fixes if required (here) which translates to bumping the chart version but not the agent one. To highlight such a case, imagine that there is an update to the built-in kubernetes integration that needs to get out but the agent version must be the same because the actual agent isn't updated. In the case there is indeed a new agent minor version is released, then again we bump the chart version and we update the app version to the former.
Looking at your proposed process, I'm wondering if there is even a need to publish Chart snapshot versions. It sounds like for each Chart release, we'll have to update Chart.yaml twice:
8.16.3-SNAPSHOT
to 8.16.3
8.16.3
to 8.16.4-SNAPSHOT
That just sounds like a lot of updating for not much gain, especially if we're going to be releasing the Chart more often than agent itself. Could we get away with just keeping the normal version and only bumping on releases?
The other question I have is how often do we actually expect to do patch releases of the Chart. The Otel Helm Chart simply bump the version and release on every change. Would that work for us?
Looking at your proposed process, I'm wondering if there is even a need to publish Chart snapshot versions. It sounds like for each Chart release, we'll have to update Chart.yaml twice:
Once to set the version from snapshot to release, say 8.16.3-SNAPSHOT to 8.16.3 And then once to set it back to snapshot: 8.16.3 to 8.16.4-SNAPSHOT That just sounds like a lot of updating for not much gain, especially if we're going to be releasing the Chart more often than agent itself. Could we get away with just keeping the normal version and only bumping on releases?
I see what you mean about the number of updates, but the gain for me is that we can deploy unreleased agent versions from each branch. Thus, we want to release also SNAPSHOT charts, which based on our current branches would be:
The other question I have is how often do we actually expect to do patch releases of the Chart. The Otel Helm Chart simply bump the version and release on every change. Would that work for us?
So if I understand this correctly for each commit on each branch, even when no chart-related files have changed, you propose to bump the Chart version and release it?
I think we need to consider how most users are going to consume these charts and expect them to work. Having a Helm chart version 8.16.3 when the most recent Elastic stack patch release is 8.16.2 is going to create quite a lot of confusion. I don't think we would be allowed do this because of this confusion and this is main reason why the independent agent release versions never run ahead of the stack version and instead use the +buildYYYYMMDDHHMM qualifier.
I like the idea of having the Helm chart able to release separate from the stack releases, but we need a less misleading version scheme than running ahead of the stack release like this.
Let's say we used major.minor.YYYYMMDDHHMM with YYYYMMDDHHMM being the timestamp we published the chart at to get sortable versions (major.minor.$gitsha could also work but isn't sortable). We need to confirm with our PMs on what a reasonable versioning scheme is before committing to one.
Then the simplest release process I can think of would be something like:
Using major.minor.YYYYMMDDHHMM avoids inventing a new versioning scheme entirely since we already use something like this for independent agent releases.
The "I don't want to think about this anymore" option that everyone would accept is just hooking this up to the stack release process and following that, with the major con being we need stack releases to push fixes out.
For each active release branch, publish a new chart version on any commit, ideally any time just the chart changes or whenever the agent version changes. Use the most recently released agent version in this chart. So a commit to the 8.15 branch around the time I wrote this comment would produce 8.15.202411052056 with agent version 8.15.3. I think a lot of people would still ask us for an 8.15.3 chart version for people who want to be aligned with the stack for whatever reason, but we can probably automate that too.
Unless I am missing something, this is almost the same with my proposal with the difference that instead of a patch version bump we will go with a timestamp. I have no issue, I thought that keeping the patch version to simple integers would cause less confusion but if you say otherwise no objection on my end π
For main, any prerelease branches, and active release branches publish a new chart version with the -SNAPSHOT suffix using the matching -SNAPSHOT agent version. This can happen on any commit.
My 2 cents here, since there will be a pipeline that filters per chart-related or agent version changes (for normal releases), there is no overhead of utilising the same here but release a chart with -SNAPSHOT
The "I don't want to think about this anymore" option that everyone would accept is just hooking this up to the stack release process and following that, with the major con being we need stack releases to push fixes out.
ha! π I am not there yet but if there a need to go with that, sure let's go?!
PS: who should we tag from Product to help decide about the most appropriate version scheme?
I think we're aligned then, I wrote the process out again to make sure I understood.
How would we handle release notes?
PS: who should we tag from Product to help decide about the most appropriate version scheme?
@strawgate @nimarezainia see https://github.com/elastic/elastic-agent/issues/5486#issuecomment-2458146953
How would we handle release notes?
I considered using the existing fragments approach with component: elastic-agent-helm-chart
, but I havenβt fully explored it yet, so Iβm open to other suggestions π
That works, but we need to define how, when, and where they get published.
Our existing release notes are in https://github.com/elastic/ingest-docs/tree/main/docs/en/ingest-management/release-notes but they are structured around the stack release.
The current path for the Helm chart would have releases happen outside of that so we'd need to handle this in some way.
It may be simpler to have a release of the helm chart for each active minor to eliminate this problem. If we want to account for package changes, we'd need a versioning scheme that can move faster than the stack release. We couldn't just have 8.16.1, 8.16.2, etc.
I understand that this is difficult given that we have stack releases and potential package releases. Can I propose that we version and release based on each minor and patch release (aligned with stack) and not complicate it by having daily or snapshot releases or releases based on changes to the package.
I think our patch releases are frequent enough to accommodate (or shorten the window) package releases that may happen between each patch release. As @strawgate also mentioned, if there's a need we can always release a modified helm as an emergency.
@nimarezainia just to check if I my understanding is correct, you propose the Helm chart releases to happen alongside the stack ones because the frequency of the patch releases is frequent enough and thus we can keep the both version locked and fully synced?
Given that the frequency of stack patch version releases is frequent enough, keeping the Helm chart and stack versions fully synced indeed makes sense, and I believe it simplifies the overall process. To implement this, I think we should extend the existing release process to include or invoke an additional step for the Helm chart release. We may need to reach out to the release team under platform engineering productivity (?!) to coordinate this effort. (cc @nimarezainia @ycombinator)
Regarding documentation, I propose utilising the existing fragments mechanism for the changelog, with the parent folder located at deploy/helm/elastic-agent/changelog
. As for the public URL path where the changelog will be hosted, itβs not entirely clear yet. However, as far as I know, an effort will start soon to create more structured documentation for the Helm chart, and I suspect this aspect will be addressed within that initiative. (cc @eedugon @kilfoyle @karenzone)
Any thoughts on this approach?
@pkoutsovasilis Your approach sounds reasonable to me.
This sounds great to me too. I gather in the docs we'd just have a link to the changelog file, which will minimize any maintenance effort.
In https://github.com/elastic/elastic-agent/pull/5331, we added a Helm Chart for Elastic Agent. At the time, we deferred the discussion on how this chart should be released. Let's continue the discussion in this issue and try to arrive at a decision.