Closed cdenneen closed 4 years ago
This is definitely something we should track (and work) on.
I am currently unaware of what changes Helm 3 exactly brings us, except for having read the design proposal a couple of months ago and being aware of it being Tiller-less. I need to schedule some time to look into the alpha release and see how it fits into what we currently have (if it fits at all), so we can determine what the following steps would be.
If someone already has looked into it and has ideas, feel free to comment.
i think the main implication is no tiller, and release information is stored in kubernetes objects and not etcd
note that in the current alpha --namespace
is broken and requires your active context to be the namespace you want to deploy into
Good session from this week's Kubecon about the changes: https://www.youtube.com/watch?v=lYzrhzLAxUI
Finally had a chance to look into the first alpha release and see how this would fit into the Helm operator. Consider the following to be a list of observations and questions that came to mind, they may be short-sighted, incomplete, prone to changes, or even incorrect.
.kube/config
is the only requirement (besides helm
)This means we no longer have to initialize and/or maintain a connection with Tiller, if a user has no Helm v2 releases, and only need to run helm
commands with a service account that has sufficient permissions.
I see room here for dropping the .spec.releaseName
field and always use the name of a HelmRelease
as a release name. This would also make some of the recently invented logic obsolete, and would ease parallelization.
Although still experimental, and a fair amount of work has to be done for alpha 2, I think this will be adopted pretty fast by some enterprise users. The Helm operator will need to understand this new chart source type, and be able to pull (helm chart pull
) or save (helm chart save
) them.
Chart.{yaml,lock}
filesNot a direct issue for the operator, as the helm dep
commands have not changed, it will however be of importance to users when they are switching.
Not much work for us here (install, upgrade actions will just error faster than before), but I hope this will give us enough tools / context to provide users with better error and validation messages on installation and upgrade failures.
The biggest question here for us to answer is: are we able to create a Helm operator which supports both versions? And if we are able to, how is it going to work?
I had a quick discussion with @stefanprodan about this and we both noticed there are some major differences between both Helm versions that by refactoring what we have to support both versions, there is a risk of introducing bugs to the currently working Helm v2 operator.
There is also a question about how the operator is going to know what helm
version it needs to call, introducing a version
field to the HelmRelease
spec is an option (which would default to v2
for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version
field to v3
one release at a time).
A second option would be to change the API version of the CRD, create new listeners and event handlers, and let those two work independent from each other. Plus side to this approach is that there is a very clear separation of the two, and incorporating specific logic for a certain version would be easier due to having the ability to create specific update methods for e.g. the queue.
NB: both of the options named above could work within the same Helm operator or in two separate operators.
As a last I want to make a note about the current slowness of the operator, which is not directly related to Helm v3, but something to keep in mind while making the choice between separation or incorporation. I think parallel operations will be a lot easier when the operator only has to deal with one Helm version.
Release names are now namespaced
I see room here for dropping the
.spec.releaseName
field and always use the name of aHelmRelease
as a release name.
When .spec.releaseName
is not supplied, the release name generated includes the namespace, so is effectively namespaced anyway.
The rationale for .spec.releaseName
is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.
There is also a question about how the operator is going to know what
helm
version it needs to call, introducing aversion
field to theHelmRelease
spec is an option (which would default tov2
for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing theversion
field tov3
one release at a time).
I quite like this alternative. I think the deciding factor between this and introducing an entirely new CRD will be whether we need (substantially) different fields. A good exercise might be to speculate on what a HelmRelease for a Helm v3 release would have to look like -- are there fields that become irrelevant? Are there new, mandatory fields?
There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).
The Chart.yaml
has an APIVersion that indicates use of Helm v3+.
We can use that to determine what version of helm the Chart requires.
https://v3.helm.sh/docs/faq/#chart-yaml-apiversion-bump
This requires less user-action and removes edge cases where the wrong client version is set for a Chart that is not supported. For transparency we should consider logging and adding the helmClient version used to the Status object.
I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name.
+1, this should be the v3 default, even if we still support HR.Spec.ReleaseName
.
The rationale for .spec.releaseName is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.
Release names used to be cluster-global, but now they are now fully namespaced in helmv3. This makes this use-case much less useful, because nothing stops you from taking over a release by just adding a HelmRelease of the same name in the same Namespace.
If we still support HR.Spec.TargetNamespace
for helmv3, then I can see HR.Spec.ReleaseName
being useful.
This would be a value-add of helm-operator since you could manage multiple releases of the same name in different Namespaces from HR's in the same management Namespace.
These would otherwise have a name collision unless you overrode HR.Spec.ReleaseName
on at least n-1
of them.
Thinking about U/X in the end: We may want a global fluxd config option to enable/disable use helmv2/v3. This is so users can enforce policies for not using v2 Tiller or beta v3 features.
I was wondering whether there was progress going on outside of this issue. Looks like #42 and #55 have both been merged into the ongoing helm-v3-dev branch.
@gsf helm-v3-dev
is the branch you want to follow for updates on the Helm 3 support.
I have been refactoring the chartsync
package into a package dedicated to syncing charts from chart sources to local storage, and a package syncing HelmRelease
resources to Helm. This work should surface in the near future and will bring us a lot closer to something I feel comfortable with merging into master
.
Helm 3.0.0 stable has been released 🎉
Helm v3 support can be tested with builds of the helm-v3-dev
branch.
Install Helm v3 CLI and add it to your path:
OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3
export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3
Install Flux:
helmv3 repo add fluxcd https://charts.fluxcd.io
kubectl create ns fluxcd
helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}
Install the HelmRelease CRD that contains the helm version field:
kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml
Install Helm Operator with Helm v3 support using the latest build:
helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-fb98e2db
Keep an eye on https://hub.docker.com/repository/docker/fluxcd/helm-operator-prerelease/tags?page=1&ordering=last_updated for new builds.
I've taken a look at the latest image (helm-v3-dev-7589ee47
) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: elasticsearch
namespace: monitoring
spec:
releaseName: elasticsearch
chart:
repository: https://helm.elastic.co
name: elasticsearch
version: 7.4.1
values:
replicas: 1
The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of replicas
:
ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff=" map[string]interface{}{\n \t... // 39 identical entries\n \t\"rbac\": map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\": float64(1),\n+ \t\"replicas\": s\"1\",\n \t\"resources\": map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n \t\"roles\": map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n \t... // 12 identical entries\n }\n"
The important part being:
- "replicas": float64(1),
+ "replicas": s"1",
This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible
$helm history elasticsearch
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
10 Sun Nov 24 11:42:23 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
11 Sun Nov 24 11:43:23 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
12 Sun Nov 24 11:44:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
13 Sun Nov 24 11:45:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
14 Sun Nov 24 11:46:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
15 Sun Nov 24 11:47:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
16 Sun Nov 24 11:48:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
17 Sun Nov 24 11:49:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
18 Sun Nov 24 11:50:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete
19 Sun Nov 24 11:51:22 2019 deployed elasticsearch-7.4.1 7.4.1 Upgrade complete
This has happenned on a number of charts where I've used integers in the values
I am taking a look on helm-v3-dev-7589ee47 and helm-operator creates secrets in the following format:
sealed-secrets.v1
where Helm 3 has a prefix:
sh.helm.release.v1.flux.v1
@rowecharles thanks for reporting this. The reason this seems to happen is because the dry-run values generated by Helm (internally) are returned directly from memory and take a shorter route than the values returned from storage, bypassing a parser that casts the float values to strings.
I was able to get rid of the spurious 'upgrades' by always casting the values to a YAML string, and then re-reading this string into a map[string]interface{}
, ensuring that the values we compare always represent the values as they would be when returned from storage. ~Expect a PR for this tomorrow.~ PR available #117, image for testing: hiddeco/helm-operator:helm-v3-v3-value-types-334af866
.
@eschereisin this is because the Helm operator is still running v3.0.0-beta.3
, while this was added in v3.0.0-beta.4
. The upgrade to v3.0.0
(stable) will arrive soon, I had to clear the path and get the master
changes in before I could start working on this.
Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?
Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?
@stromvirvel See https://github.com/fluxcd/helm-operator/issues/8#issuecomment-553791295
Pre-release builds are published on every commit to the helm-v3-dev branch. All testing and feedback would be welcomed I'm sure 😄
Helm Operator using Helm v3 (stable) can now be tested.
Install Helm v3 CLI and add it to your path:
OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3
export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3
Install Flux:
helmv3 repo add fluxcd https://charts.fluxcd.io
kubectl create ns fluxcd
helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}
Install the HelmRelease CRD that contains the helm version field:
kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml
Install Helm Operator with Helm v3 support using the latest build:
helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-0b11d9d0
We've created a a GitHub issue template for Helm v3. Please take this for a spin and create issues if you find any problems with it. Thanks!
Hey!
I'm testing fluxcd/helm-operator-prerelease:helm-v3-dev-0b11d9d0
on my cluster right now and it mostly works but I get a strange behaviour.
For example. Here is what I get after a fresh install of sealed-secrets-controller:
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:28.735809668Z caller=release.go:280 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="no existing release; installing"
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:29.755719 7 client.go:87] creating 10 resource(s)
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:30.058021 7 wait.go:51] beginning wait for 10 resources with timeout of 5m0s
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:32.237735 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:34.085919 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:36.081182 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:38.079472 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:40.088582 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:42.084390 7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:44.125082595Z caller=release.go:224 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="Helm release sync succeeded" revision=1.6.0
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:59.17677442Z caller=operator.go:309 component=operator info="enqueuing release" resource=kube-system:helmrelease/sealed-secrets-controller
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:59.596729 7 install.go:170] WARNING: This chart or one of its subcharts contains CRDs. Rendering may fail or contain inaccuracies.
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:16:00.907662525Z caller=release.go:182 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [3fcfc29e-0eb4-4244-9a29-50544aba024b]: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: CustomResourceDefinition, namespace: , name: sealedsecrets.bitnami.com"
I also get validation errors from cert-manager release, which is also managed by helm-operator right now:
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:25:00.496478628Z caller=release.go:182 component=release release=cert-manager targetNamespace=ingress resource=ingress:helmrelease/cert-manager helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [333a1ef6-b72d-4122-857b-60852322ac72]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\" in ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values[0]"
Do you have any ideas how to address that?
Thank you very much!
@dragonsmith Are you able to install these charts with the helm v3 CLI? The way CRDs are handled has changed: https://helm.sh/docs/topics/chart_best_practices/custom_resource_definitions/
@gsf the interesting thing is these charts are installed successfully either way (via helm3-cli or using helm-operator). It's just that helm operator gives such warnings.
Also, look closer: the second warning is about: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\"
I'm aware that there are changes to CRD handling but I thought it steps in only if we use Chart api V2, because we actually need a back compatibility with current charts, but I did not dive into the new code.
@dragonsmith That ValidationError looks like this issue: https://github.com/pegasystems/pega-helm-charts/issues/34#issuecomment-558662554.
Glad to see this merged into master, can't wait for the next release 👍
Release with beta support for Helm v3 was just published, changelog can be found here: https://github.com/fluxcd/helm-operator/releases/tag/v1.0.0-rc5
I am keeping this issue open till it can be considered stable.
Latest release (v1.0.0-rc5
) is working great for the most part. I'm getting issues with the prometheus-operator chart which every time a chart sync occurs is triggering an upgrade. This is due to the current and desired charts having diverged. The logs aren't particularly helpful and are a mass of hex characters.
I1223 21:40:53.734241 6 upgrade.go:87] performing update for prometheus
I1223 21:40:54.088285 6 upgrade.go:220] dry run for prometheus
ts=2019-12-23T21:41:04.985070741Z caller=release.go:369 component=release release=prometheus targetNamespace=monitoring resource=monitoring:helmrelease/prometheus helmVersion=v3 info="chart has diverged" diff=" &helm.Chart{\n \t... // 3 identical fields\n \tFiles: []*helm.File{&{Name: \"crds/crd-servicemonitor.yaml\", Data: []uint8{0x23, 0x20, 0x53, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x3a, 0x20, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3a, 0x2f, 0x2f, 0x72, 0x61, 0x77, 0x2e, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x75 ...
I'm using this as the release:
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: prometheus
namespace: monitoring
spec:
releaseName: prometheus
chart:
repository: https://kubernetes-charts.storage.googleapis.com/
name: prometheus-operator
version: 8.4.0
values:
global:
rbac:
create: true
pspEnabled: true
@rowecharles thanks for reporting, my guess is that the CRD is included as a dry-run chart result but not included in the chart we compare it with retrieved from storage.
We could technically reduce the comparison for charts from repository sources as the published versions should be immutable, making it possible to rely on just the chart (version) metadata instead of doing file comparisons like we do to detect changes from git chart sources. This would however not solve the problem for charts from git sources that make use of CRDs.
Will play around with the above and work on a patch. Please note however that I am officially off and recharging till Jan 7, so it may take a little longer than usual. :battery::christmas_tree:
It would be great if the helm operator would only install actuall charts if the version changed (just compare the version number in case of a normal helm chart) or the git commit id changed (in case of a chart stored in a git repo). Right now helm operator consumes a lot of cpu...
Maybe this could also be an option which could be turned off for debugging.
I have also found some other issue... if you have a lot of helm releases with stuff like metallb, storage and real applications the order of the installation can cause problems. You cannot install an application if the underlying storage or loadbalancer service is not ready. The helm operator was stuck installing applications which did not finish. I had to manually remove these application helm releases until helm operator decided to install metallb and the storage helm charts... afterswards the applcation were also fine...
One solution could be to prioritise namespaces like kube-system or to postpone a helm release for some time if the initial install failed or timed out?
The new prometheus metric are confusing me. I would like to build an alert to notify if a helm release fails. I would expect some metric like helm_status.
The metric flux_helm_operator_release_duration_seconds_sum success=true/false metric does not correspond with the status of kubectl get hr. I would expect to have success=true for all releases which are marked as deployed...
Installing the prometheus operator using helm operator generates error messages like this:
ts=2019-12-28T17:08:03.472375615Z caller=annotator.go:100 component=release release=prometheus targetNamespace=monitoring resource=monitoring:helmrelease/prometheus-operator helmVersion=v3 err="Object 'Kind' is missing in 'null'"
I stumbled over similar issues to other people (charts constantly redeploying for no reason), so if you need a few others to validate that patch with @hiddeco (and obviously keep enjoying your vacation instead of reading this!) you can check out:
stable/datadog edge/linkerd2
For me both constantly redeploy over and over, with linkerd actually taking down helm-operator as it can't figure out some tap stuff while the pods relaunch.
I have an issue with one of my custom charts. It installs correctly when I use helm v3 directly.
I have 2 hooks:
Helm operator fails because on the next poll interval it upgrades the release while a first install didnt yet finish? Maybe because the migrate hook is still running?
When i run the helm install process manually (helm install pimcore . -f values.qa2.yaml -n xxx-qa2) and while the install is running i run the upgrade in a loop (helm upgrade pimcore . -f values.qa2.yaml -n xxx-qa2) it doesnt error out. I see this: "Error: UPGRADE FAILED: "pimcore" has no deployed releases". Just after then install finishes it upgrades with success.
I also tried running the install command in another terminal (in a loop) when the helm release was installing for the first time. I got this: "Error: cannot re-use a name that is still in use" while installing and after the install finished.
Did anyone had issues like this?
This is the log:
I0101 23:56:00.298892 6 client.go:87] creating 22 resource(s)
I0101 23:56:00.425264 6 wait.go:51] beginning wait for 22 resources with timeout of 5m0s
I0101 23:56:02.454140 6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:04.447469 6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
…
I0101 23:56:06.449462 6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:08.448750 6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:30.504967 6 client.go:87] creating 1 resource(s)
I0101 23:56:30.516273 6 client.go:420] Watching for changes to Job pimcore-install with timeout of 5m0s
I0101 23:56:30.551505 6 client.go:445] Add/Modify event for pimcore-install: MODIFIED
I0101 23:56:30.551678 6 client.go:484] pimcore-install: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
I0101 23:56:43.587963 6 client.go:445] Add/Modify event for pimcore-install: MODIFIED
I0101 23:56:43.593191 6 client.go:220] Starting delete for "pimcore-migrate" Job
I0101 23:56:43.597942 6 client.go:245] jobs.batch "pimcore-migrate" not found
I0101 23:56:43.633058 6 client.go:87] creating 1 resource(s)
I0101 23:56:43.641761 6 client.go:420] Watching for changes to Job pimcore-migrate with timeout of 5m0s
I0101 23:56:43.670773 6 client.go:445] Add/Modify event for pimcore-migrate: MODIFIED
I0101 23:56:43.670799 6 client.go:484] pimcore-migrate: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
I0101 23:56:58.583841 6 client.go:445] Add/Modify event for pimcore-migrate: MODIFIED
I0101 23:56:58.591805 6 client.go:220] Starting delete for "pimcore-install" Job
ts=2020-01-01T23:56:58.659490809Z caller=release.go:254 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 info="Helm release sync succeeded" revision=4335e21a35d48576151c45008b5698826322b229
I0101 23:57:32.710157 6 upgrade.go:79] preparing upgrade for pimcore
I0101 23:57:33.398012 6 upgrade.go:87] performing update for pimcore
ts=2020-01-01T23:57:33.48955898Z caller=release.go:188 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [pimcore]: rendered manifests contain a new resource that already exists. Unable to continue with update: existing resource conflict: kind: PersistentVolume, namespace: , name: pimcore-sharedfs-qa2"
ts=2020-01-01T23:58:57.128901016Z caller=operator.go:309 component=operator info="enqueuing release" resource=xxx-qa2:helmrelease/pimcore
I0101 23:58:59.559191 6 upgrade.go:79] preparing upgrade for pimcore
I0101 23:59:00.067934 6 upgrade.go:87] performing update for pimcore
ts=2020-01-01T23:59:00.129994283Z caller=release.go:188 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [pimcore]: rendered manifests contain a new resource that already exists. Unable to continue with update: existing resource conflict: kind: PersistentVolume, namespace: , name: pimcore-sharedfs-qa2"```
This is from "kubectl describe hr/pimcore":
```Status:
Conditions:
Last Transition Time: 2020-01-01T23:55:59Z
Last Update Time: 2020-01-02T21:04:59Z
Message: successfully cloned chart revision: 9de0408637a7b863ecb2b0554e668309e4410327
Reason: GitRepoCloned
Status: True
Type: ChartFetched
Last Transition Time: 2020-01-02T14:41:00Z
Last Update Time: 2020-01-02T21:05:00Z
Message: failed to compose values for chart release
Reason: HelmUpgradeFailed
Status: False
Type: Released
@apenney @hiddeco I still see the same behaviour on 1.0.0-rc6 :(
@slenky I haven't worked yet on mentioned problem, the release from yesterday took care of other issues (see release notes). I will get to this one soon.
@hiddeco confirming that latest 1.0.0-rc7 had fixed an issue :) Thank you and good job!
Also confirming that we're looking good, really appreciate the quick fixes! :)
The latest rc seems just install prometheus operator during the first run and leaves it untouched afterwards. But I still get this error message:
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
Can we just ignore this message?
I have also found this problem in flux-helm-operator:
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: ts=2020-01-10T21:47:01.066165659Z caller=release.go:217 component=release release=blackbox-exporter targetNamespace=monitoring resource=monitoring:helmrelease/blackbox-exporter helmVersion=v3 error="Helm release failed" revision=1.6.0 err="failed to upgrade chart for release [blackbox-exporter]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(PodDisruptionBudget.spec): unknown field \"enabled\" in io.k8s.api.policy.v1beta1.PodDisruptionBudgetSpec"
I do not understand this error my config looks just fine and just to work before using helm2:
---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: blackbox-exporter
namespace: monitoring
annotations:
fluxcd.io/automated: 'true'
filter.fluxcd.io/chart-image: semver:~0.15
spec:
releaseName: blackbox-exporter
helmVersion: v3
chart:
repository: https://kubernetes-charts.storage.googleapis.com/
name: prometheus-blackbox-exporter
version: 2.0.0
values:
name: blackbox-exporter
image:
tag: v0.15.1
resources:
limits:
memory: 64Mi
requests:
cpu: 50m
memory: 64Mi
I do not see the use of the keyword enabled anywhere in this helm release.
@runningman84 That "enabled" field in PodDisruptionBudget was a Helm V3 incompatibility in prometheus-blackbox-exporter prior to version 2.0.0. You may need to set the content of PodDisruptionBudget as described in the README: https://github.com/helm/charts/tree/master/stable/prometheus-blackbox-exporter#200
I can also confirm that 1.0.0-rc7
has fixed my issues with spurious upgrades. Thanks for the great work!
@runningman84 the crd-install
hooks are no longer supported in helm 3. I think it's safe to ignore the errors.
I've taken a look at the latest image (
helm-v3-dev-7589ee47
) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:apiVersion: helm.fluxcd.io/v1 kind: HelmRelease metadata: name: elasticsearch namespace: monitoring spec: releaseName: elasticsearch chart: repository: https://helm.elastic.co name: elasticsearch version: 7.4.1 values: replicas: 1
The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of
replicas
:ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff=" map[string]interface{}{\n \t... // 39 identical entries\n \t\"rbac\": map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\": float64(1),\n+ \t\"replicas\": s\"1\",\n \t\"resources\": map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n \t\"roles\": map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n \t... // 12 identical entries\n }\n"
The important part being:
- "replicas": float64(1), + "replicas": s"1",
This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible
$helm history elasticsearch REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 10 Sun Nov 24 11:42:23 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 11 Sun Nov 24 11:43:23 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 12 Sun Nov 24 11:44:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 13 Sun Nov 24 11:45:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 14 Sun Nov 24 11:46:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 15 Sun Nov 24 11:47:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 16 Sun Nov 24 11:48:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 17 Sun Nov 24 11:49:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 18 Sun Nov 24 11:50:22 2019 superseded elasticsearch-7.4.1 7.4.1 Upgrade complete 19 Sun Nov 24 11:51:22 2019 deployed elasticsearch-7.4.1 7.4.1 Upgrade complete
This has happenned on a number of charts where I've used integers in the values
I have also ran into this and it's rather annoying about every 5 minutes its does a revision.
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
471 Tue Jan 14 21:19:12 2020 superseded datarepo-0.0.4 Upgrade complete
472 Tue Jan 14 21:22:10 2020 superseded datarepo-0.0.4 Upgrade complete
473 Tue Jan 14 21:25:07 2020 superseded datarepo-0.0.4 Upgrade complete
474 Tue Jan 14 21:28:06 2020 superseded datarepo-0.0.4 Upgrade complete
475 Tue Jan 14 21:31:05 2020 superseded datarepo-0.0.4 Upgrade complete
476 Tue Jan 14 21:34:13 2020 superseded datarepo-0.0.4 Upgrade complete
477 Tue Jan 14 21:37:06 2020 superseded datarepo-0.0.4 Upgrade complete
478 Tue Jan 14 21:40:10 2020 superseded datarepo-0.0.4 Upgrade complete
479 Tue Jan 14 21:43:05 2020 superseded datarepo-0.0.4 Upgrade complete
480 Tue Jan 14 21:46:05 2020 deployed datarepo-0.0.4 Upgrade complete
@smark88 you are running an old version, helm-v3-dev
has been merged into master
, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7
.
Not sure if there is anything to do with 1.0.0-rc7 but I was running from a build of helm-v3-dev for a few weeks, I have changed to 1.0.0-rc7 and now most of the helm releases have a status of "failed to compose values for chart release" even though they are installed fine
For example
❯ helm history velero -n velero
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Tue Dec 24 16:08:53 2019 deployed velero-2.7.5 1.2.0 Install complete
❯ kubectl get hr -n velero
NAME RELEASE STATUS MESSAGE AGE
velero velero deployed failed to compose values for chart release 21d
@smark88 you are running an old version,
helm-v3-dev
has been merged intomaster
, and the latest version with Helm 3 support (and bug fixes) is1.0.0-rc7
.
helm-operator:1.0.0-rc5
is the container I have deployed I will try upgrading to 1.0.0-rc7
and see if that resolves it.
I have also two releases with status failed to compose values also they are installed using the rc7 version.
@runningman84 @REBELinBLUE can you both share a copy of one of HelmRelease
s causing trouble?
I can share one of them which is using a public chart:
---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: external-dns
namespace: kube-system
annotations:
fluxcd.io/automated: "true"
filter.fluxcd.io/chart-image: semver:~0.5
spec:
releaseName: external-dns
helmVersion: v3
chart:
repository: https://kubernetes-charts.storage.googleapis.com/
name: external-dns
version: 2.5.3
values:
name: external-dns
image:
tag: 0.5.15-debian-9-r1
rbac:
create: true
provider: aws
#txtOwnerId: k8s-prod-home
aws:
accessKey: xxxxxxxxxxxxxxxxxxxxxxxx
secretKey: yyyyyyyyyyyyyyyyyyyyyyyy
region: us-east-1
this is the kubectl output:
root@cubi001:~# kubectl get hr --all-namespaces
NAMESPACE NAME RELEASE STATUS MESSAGE AGE
kube-system external-auth-server external-auth-server deployed Helm release sync succeeded 19d
kube-system external-dns external-dns deployed failed to compose values for chart release 19d
@runningman84 this works without any issues for me, can you share the Status
output from kubectl describe hr
?
this is the output
kubectl describe hr external-dns -n kube-system
Name: external-dns
Namespace: kube-system
Labels: fluxcd.io/sync-gc-mark=sha256.d2hcYp2rAOHFEhWk8hRzjHP_lI9tPeoShKH4OThrveI
Annotations: filter.fluxcd.io/chart-image: semver:~0.5
fluxcd.io/automated: true
fluxcd.io/sync-checksum: 44ada7e95a8eed235e4e0c2bea067e1a9e5fec95
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"helm.fluxcd.io/v1","kind":"HelmRelease","metadata":{"annotations":{"filter.fluxcd.io/chart-image":"semver:~0.5","fluxcd.io/...
API Version: helm.fluxcd.io/v1
Kind: HelmRelease
Metadata:
Creation Timestamp: 2019-12-26T21:55:32Z
Generation: 1
Resource Version: 4630545
Self Link: /apis/helm.fluxcd.io/v1/namespaces/kube-system/helmreleases/external-dns
UID: 56d5f433-05f4-4e17-a422-c38f93d9e43b
Spec:
Chart:
Name: external-dns
Repository: https://kubernetes-charts.storage.googleapis.com/
Version: 2.5.3
Helm Version: v3
Release Name: external-dns
Values:
Aws:
Access Key: xxxxxxxxxxxxxxxxxxxxxxxx
Region: us-east-1
Secret Key: yyyyyyyyyyyyyyyyyyyyyyyy
Image:
Tag: 0.5.15-debian-9-r1
Name: external-dns
Provider: aws
Rbac:
Create: true
Txt Owner Id: k8s-prod-home
Status:
Conditions:
Last Transition Time: 2019-12-30T06:49:28Z
Last Update Time: 2019-12-30T06:49:28Z
Message: failed to compose values for chart release
Reason: HelmUpgradeFailed
Status: False
Type: Released
Last Transition Time: 2019-12-26T22:49:16Z
Last Update Time: 2020-01-13T20:46:27Z
Message: chart fetched: external-dns-2.5.3.tgz
Reason: RepoChartInCache
Status: True
Type: ChartFetched
Observed Generation: 1
Release Name: external-dns
Release Status: deployed
Revision: 2.5.3
Values Checksum: db74f0cd3f490aff01288488cbc3d9ee8484af151db23ae096a015c59ca20124
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ChartSynced 13m (x75 over 37h) helm-operator Chart managed by HelmRelease processed
@smark88 you are running an old version,
helm-v3-dev
has been merged intomaster
, and the latest version with Helm 3 support (and bug fixes) is1.0.0-rc7
.
helm-operator:1.0.0-rc5
is the container I have deployed I will try upgrading to1.0.0-rc7
and see if that resolves it.
I redeployed last night and the 1.0.0-rc5 --> 1.0.0-rc7
has resolved my issue. Typically by this time in the morning I would have about 100+ revisions.
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
dev-jade dev 1 2020-01-14 23:08:11.235655799 +0000 UTC deployed datarepo-0.0.4
secrets dev 1 2020-01-14 23:08:13.113370804 +0000 UTC deployed create-secret-manager-secret-0.0.4
@hiddeco https://github.com/REBELinBLUE/k8s-on-hypriot/blob/master/deployments/velero/velero/velero.yaml but most of the charts in the deployments
folder are showing the status, the only ones which aren't are the ones I removed with kubectl delete hr <name>
and then reapplied
@REBELinBLUE @runningman84 does one of you still have access to logs from around the time the status message was pushed for the release?
With the release of Helm 3 would like to track progress for this integration