fluxcd / helm-operator

Successor: https://github.com/fluxcd/helm-controller — The Flux Helm Operator, once upon a time a solution for declarative Helming.
https://docs.fluxcd.io/projects/helm-operator/
Apache License 2.0
648 stars 262 forks source link

Helm 3 support #8

Closed cdenneen closed 4 years ago

cdenneen commented 5 years ago

With the release of Helm 3 would like to track progress for this integration

hiddeco commented 5 years ago

This is definitely something we should track (and work) on.

I am currently unaware of what changes Helm 3 exactly brings us, except for having read the design proposal a couple of months ago and being aware of it being Tiller-less. I need to schedule some time to look into the alpha release and see how it fits into what we currently have (if it fits at all), so we can determine what the following steps would be.

If someone already has looked into it and has ideas, feel free to comment.

timja commented 5 years ago

i think the main implication is no tiller, and release information is stored in kubernetes objects and not etcd

note that in the current alpha --namespace is broken and requires your active context to be the namespace you want to deploy into

jpds commented 5 years ago

Good session from this week's Kubecon about the changes: https://www.youtube.com/watch?v=lYzrhzLAxUI

hiddeco commented 5 years ago

Finally had a chance to look into the first alpha release and see how this would fit into the Helm operator. Consider the following to be a list of observations and questions that came to mind, they may be short-sighted, incomplete, prone to changes, or even incorrect.


A .kube/config is the only requirement (besides helm)

This means we no longer have to initialize and/or maintain a connection with Tiller, if a user has no Helm v2 releases, and only need to run helm commands with a service account that has sufficient permissions.

Release names are now namespaced

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name. This would also make some of the recently invented logic obsolete, and would ease parallelization.

OCI registries are going to be supported as chart repositories

Although still experimental, and a fair amount of work has to be done for alpha 2, I think this will be adopted pretty fast by some enterprise users. The Helm operator will need to understand this new chart source type, and be able to pull (helm chart pull) or save (helm chart save) them.

Helm charts with dependencies are not interchangeable between Helm versions as Helm v3 depends on Chart.{yaml,lock} files

Not a direct issue for the operator, as the helm dep commands have not changed, it will however be of importance to users when they are switching.

Chart value validation is now supported with JSONSchemes

Not much work for us here (install, upgrade actions will just error faster than before), but I hope this will give us enough tools / context to provide users with better error and validation messages on installation and upgrade failures.


The biggest question here for us to answer is: are we able to create a Helm operator which supports both versions? And if we are able to, how is it going to work?

I had a quick discussion with @stefanprodan about this and we both noticed there are some major differences between both Helm versions that by refactoring what we have to support both versions, there is a risk of introducing bugs to the currently working Helm v2 operator.

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

A second option would be to change the API version of the CRD, create new listeners and event handlers, and let those two work independent from each other. Plus side to this approach is that there is a very clear separation of the two, and incorporating specific logic for a certain version would be easier due to having the ability to create specific update methods for e.g. the queue.

NB: both of the options named above could work within the same Helm operator or in two separate operators.

As a last I want to make a note about the current slowness of the operator, which is not directly related to Helm v3, but something to keep in mind while making the choice between separation or incorporation. I think parallel operations will be a lot easier when the operator only has to deal with one Helm version.

squaremo commented 5 years ago

Release names are now namespaced

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name.

When .spec.releaseName is not supplied, the release name generated includes the namespace, so is effectively namespaced anyway.

The rationale for .spec.releaseName is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

I quite like this alternative. I think the deciding factor between this and introducing an entirely new CRD will be whether we need (substantially) different fields. A good exercise might be to speculate on what a HelmRelease for a Helm v3 release would have to look like -- are there fields that become irrelevant? Are there new, mandatory fields?

stealthybox commented 5 years ago

There is also a question about how the operator is going to know what helm version it needs to call, introducing a version field to the HelmRelease spec is an option (which would default to v2 for backwards compatibility), and this would give you the option to migrate your releases one-by-one (by changing the version field to v3 one release at a time).

The Chart.yaml has an APIVersion that indicates use of Helm v3+. We can use that to determine what version of helm the Chart requires. https://v3.helm.sh/docs/faq/#chart-yaml-apiversion-bump

This requires less user-action and removes edge cases where the wrong client version is set for a Chart that is not supported. For transparency we should consider logging and adding the helmClient version used to the Status object.

I see room here for dropping the .spec.releaseName field and always use the name of a HelmRelease as a release name.

+1, this should be the v3 default, even if we still support HR.Spec.ReleaseName.

The rationale for .spec.releaseName is to account for "taking over" a release -- that is, introducing a HelmRelease resource to manage an existing release, and thereby avoid disruption. That use case hasn't gone away, though we could in principle decide not to care about it.

Release names used to be cluster-global, but now they are now fully namespaced in helmv3. This makes this use-case much less useful, because nothing stops you from taking over a release by just adding a HelmRelease of the same name in the same Namespace.

If we still support HR.Spec.TargetNamespace for helmv3, then I can see HR.Spec.ReleaseName being useful. This would be a value-add of helm-operator since you could manage multiple releases of the same name in different Namespaces from HR's in the same management Namespace. These would otherwise have a name collision unless you overrode HR.Spec.ReleaseName on at least n-1 of them.

stealthybox commented 5 years ago

Thinking about U/X in the end: We may want a global fluxd config option to enable/disable use helmv2/v3. This is so users can enforce policies for not using v2 Tiller or beta v3 features.

gsf commented 4 years ago

I was wondering whether there was progress going on outside of this issue. Looks like #42 and #55 have both been merged into the ongoing helm-v3-dev branch.

hiddeco commented 4 years ago

@gsf helm-v3-dev is the branch you want to follow for updates on the Helm 3 support.

I have been refactoring the chartsync package into a package dedicated to syncing charts from chart sources to local storage, and a package syncing HelmRelease resources to Helm. This work should surface in the near future and will bring us a lot closer to something I feel comfortable with merging into master.

dminca commented 4 years ago

Helm 3.0.0 stable has been released 🎉

stefanprodan commented 4 years ago

Helm v3 support can be tested with builds of the helm-v3-dev branch.

Install Helm v3 CLI and add it to your path:

OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3

export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3

Install Flux:

helmv3 repo add fluxcd https://charts.fluxcd.io

kubectl create ns fluxcd

helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}

Install the HelmRelease CRD that contains the helm version field:

kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml

Install Helm Operator with Helm v3 support using the latest build:

helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-fb98e2db

Keep an eye on https://hub.docker.com/repository/docker/fluxcd/helm-operator-prerelease/tags?page=1&ordering=last_updated for new builds.

rowecharles commented 4 years ago

I've taken a look at the latest image (helm-v3-dev-7589ee47) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  releaseName: elasticsearch
  chart:
    repository: https://helm.elastic.co
    name: elasticsearch
    version: 7.4.1
  values:
    replicas: 1

The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of replicas:

ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff="  map[string]interface{}{\n  \t... // 39 identical entries\n  \t\"rbac\":           map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n  \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\":       float64(1),\n+ \t\"replicas\":       s\"1\",\n  \t\"resources\":      map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n  \t\"roles\":          map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n  \t... // 12 identical entries\n  }\n"

The important part being:

- "replicas":       float64(1),
+ "replicas":       s"1",

This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible

$helm history elasticsearch
REVISION    UPDATED                     STATUS      CHART               APP VERSION DESCRIPTION
10          Sun Nov 24 11:42:23 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
11          Sun Nov 24 11:43:23 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
12          Sun Nov 24 11:44:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
13          Sun Nov 24 11:45:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
14          Sun Nov 24 11:46:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
15          Sun Nov 24 11:47:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
16          Sun Nov 24 11:48:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
17          Sun Nov 24 11:49:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
18          Sun Nov 24 11:50:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
19          Sun Nov 24 11:51:22 2019    deployed    elasticsearch-7.4.1 7.4.1       Upgrade complete

This has happenned on a number of charts where I've used integers in the values

eschereisin commented 4 years ago

I am taking a look on helm-v3-dev-7589ee47 and helm-operator creates secrets in the following format:

sealed-secrets.v1

where Helm 3 has a prefix:

sh.helm.release.v1.flux.v1
hiddeco commented 4 years ago

@rowecharles thanks for reporting this. The reason this seems to happen is because the dry-run values generated by Helm (internally) are returned directly from memory and take a shorter route than the values returned from storage, bypassing a parser that casts the float values to strings.

I was able to get rid of the spurious 'upgrades' by always casting the values to a YAML string, and then re-reading this string into a map[string]interface{}, ensuring that the values we compare always represent the values as they would be when returned from storage. ~Expect a PR for this tomorrow.~ PR available #117, image for testing: hiddeco/helm-operator:helm-v3-v3-value-types-334af866.


@eschereisin this is because the Helm operator is still running v3.0.0-beta.3, while this was added in v3.0.0-beta.4. The upgrade to v3.0.0 (stable) will arrive soon, I had to clear the path and get the master changes in before I could start working on this.

stromvirvel commented 4 years ago

Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?

timja commented 4 years ago

Is there an estimate when helm-operator supports helmv3 production-ready? Can I offer help for testing/evaluation?

@stromvirvel See https://github.com/fluxcd/helm-operator/issues/8#issuecomment-553791295

Pre-release builds are published on every commit to the helm-v3-dev branch. All testing and feedback would be welcomed I'm sure 😄

stefanprodan commented 4 years ago

Helm Operator using Helm v3 (stable) can now be tested.

Install Helm v3 CLI and add it to your path:

OS=darwin-amd64 && \
mkdir -p $HOME/.helm3/bin && \
curl -sSL "https://get.helm.sh/helm-v3.0.0-${OS}.tar.gz" | tar xvz && \
chmod +x ${OS}/helm && mv ${OS}/helm $HOME/.helm3/bin/helmv3

export PATH=$PATH:$HOME/.helm3/bin
export HELM_HOME=$HOME/.helm3

Install Flux:

helmv3 repo add fluxcd https://charts.fluxcd.io

kubectl create ns fluxcd

helmv3 upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:${GHUSER}/${GHREPO}

Install the HelmRelease CRD that contains the helm version field:

kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/helm-v3-dev/deploy/flux-helm-release-crd.yaml

Install Helm Operator with Helm v3 support using the latest build:

helmv3 upgrade -i helm-operator fluxcd/helm-operator --wait \
--namespace fluxcd \
--set git.ssh.secretName=flux-git-deploy \
--set configureRepositories.enable=true \
--set configureRepositories.repositories[0].name=stable \
--set configureRepositories.repositories[0].url=https://kubernetes-charts.storage.googleapis.com \
--set extraEnvs[0].name=HELM_VERSION \
--set extraEnvs[0].value=v3 \
--set image.repository=docker.io/fluxcd/helm-operator-prerelease \
--set image.tag=helm-v3-dev-0b11d9d0

We've created a a GitHub issue template for Helm v3. Please take this for a spin and create issues if you find any problems with it. Thanks!

dragonsmith commented 4 years ago

Hey! I'm testing fluxcd/helm-operator-prerelease:helm-v3-dev-0b11d9d0 on my cluster right now and it mostly works but I get a strange behaviour.

For example. Here is what I get after a fresh install of sealed-secrets-controller:

helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:28.735809668Z caller=release.go:280 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="no existing release; installing"
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:29.755719       7 client.go:87] creating 10 resource(s)
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:30.058021       7 wait.go:51] beginning wait for 10 resources with timeout of 5m0s
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:32.237735       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:34.085919       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:36.081182       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:38.079472       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:40.088582       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:42.084390       7 wait.go:199] Deployment is not ready: kube-system/sealed-secrets-controller. 0 out of 1 expected pods are ready
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:44.125082595Z caller=release.go:224 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 info="Helm release sync succeeded" revision=1.6.0
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:15:59.17677442Z caller=operator.go:309 component=operator info="enqueuing release" resource=kube-system:helmrelease/sealed-secrets-controller
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator I1203 19:15:59.596729       7 install.go:170] WARNING: This chart or one of its subcharts contains CRDs. Rendering may fail or contain inaccuracies.
helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:16:00.907662525Z caller=release.go:182 component=release release=sealed-secrets-controller targetNamespace=kube-system resource=kube-system:helmrelease/sealed-secrets-controller helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [3fcfc29e-0eb4-4244-9a29-50544aba024b]: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: CustomResourceDefinition, namespace: , name: sealedsecrets.bitnami.com"

I also get validation errors from cert-manager release, which is also managed by helm-operator right now:

helm-operator-85f4dc6fbc-h6xjt flux-helm-operator ts=2019-12-03T19:25:00.496478628Z caller=release.go:182 component=release release=cert-manager targetNamespace=ingress resource=ingress:helmrelease/cert-manager helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [333a1ef6-b72d-4122-857b-60852322ac72]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\" in ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values[0]"

Do you have any ideas how to address that?

Thank you very much!

gsf commented 4 years ago

@dragonsmith Are you able to install these charts with the helm v3 CLI? The way CRDs are handled has changed: https://helm.sh/docs/topics/chart_best_practices/custom_resource_definitions/

dragonsmith commented 4 years ago

@gsf the interesting thing is these charts are installed successfully either way (via helm3-cli or using helm-operator). It's just that helm operator gives such warnings.

Also, look closer: the second warning is about: ValidationError(ValidatingWebhookConfiguration.webhooks[0].namespaceSelector.matchExpressions[1].values): unknown object type \"nil\"

I'm aware that there are changes to CRD handling but I thought it steps in only if we use Chart api V2, because we actually need a back compatibility with current charts, but I did not dive into the new code.

gsf commented 4 years ago

@dragonsmith That ValidationError looks like this issue: https://github.com/pegasystems/pega-helm-charts/issues/34#issuecomment-558662554.

onedr0p commented 4 years ago

Glad to see this merged into master, can't wait for the next release 👍

hiddeco commented 4 years ago

Release with beta support for Helm v3 was just published, changelog can be found here: https://github.com/fluxcd/helm-operator/releases/tag/v1.0.0-rc5

I am keeping this issue open till it can be considered stable.

rowecharles commented 4 years ago

Latest release (v1.0.0-rc5) is working great for the most part. I'm getting issues with the prometheus-operator chart which every time a chart sync occurs is triggering an upgrade. This is due to the current and desired charts having diverged. The logs aren't particularly helpful and are a mass of hex characters.

I1223 21:40:53.734241       6 upgrade.go:87] performing update for prometheus
I1223 21:40:54.088285       6 upgrade.go:220] dry run for prometheus
ts=2019-12-23T21:41:04.985070741Z caller=release.go:369 component=release release=prometheus targetNamespace=monitoring resource=monitoring:helmrelease/prometheus helmVersion=v3 info="chart has diverged" diff="  &helm.Chart{\n  \t... // 3 identical fields\n  \tFiles:     []*helm.File{&{Name: \"crds/crd-servicemonitor.yaml\", Data: []uint8{0x23, 0x20, 0x53, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x3a, 0x20, 0x68, 0x74, 0x74, 0x70, 0x73, 0x3a, 0x2f, 0x2f, 0x72, 0x61, 0x77, 0x2e, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x75 ...

I'm using this as the release:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: prometheus
  namespace: monitoring
spec:
  releaseName: prometheus
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: prometheus-operator
    version: 8.4.0
  values:
    global:
      rbac:
        create: true
        pspEnabled: true
hiddeco commented 4 years ago

@rowecharles thanks for reporting, my guess is that the CRD is included as a dry-run chart result but not included in the chart we compare it with retrieved from storage.

We could technically reduce the comparison for charts from repository sources as the published versions should be immutable, making it possible to rely on just the chart (version) metadata instead of doing file comparisons like we do to detect changes from git chart sources. This would however not solve the problem for charts from git sources that make use of CRDs.

Will play around with the above and work on a patch. Please note however that I am officially off and recharging till Jan 7, so it may take a little longer than usual. :battery::christmas_tree:

runningman84 commented 4 years ago

It would be great if the helm operator would only install actuall charts if the version changed (just compare the version number in case of a normal helm chart) or the git commit id changed (in case of a chart stored in a git repo). Right now helm operator consumes a lot of cpu...

Maybe this could also be an option which could be turned off for debugging.

I have also found some other issue... if you have a lot of helm releases with stuff like metallb, storage and real applications the order of the installation can cause problems. You cannot install an application if the underlying storage or loadbalancer service is not ready. The helm operator was stuck installing applications which did not finish. I had to manually remove these application helm releases until helm operator decided to install metallb and the storage helm charts... afterswards the applcation were also fine...

One solution could be to prioritise namespaces like kube-system or to postpone a helm release for some time if the initial install failed or timed out?

runningman84 commented 4 years ago

The new prometheus metric are confusing me. I would like to build an alert to notify if a helm release fails. I would expect some metric like helm_status.

The metric flux_helm_operator_release_duration_seconds_sum success=true/false metric does not correspond with the status of kubectl get hr. I would expect to have success=true for all releases which are marked as deployed...

runningman84 commented 4 years ago

Installing the prometheus operator using helm operator generates error messages like this:

ts=2019-12-28T17:08:03.472375615Z caller=annotator.go:100 component=release release=prometheus targetNamespace=monitoring resource=monitoring:helmrelease/prometheus-operator helmVersion=v3 err="Object 'Kind' is missing in 'null'"
apenney commented 4 years ago

I stumbled over similar issues to other people (charts constantly redeploying for no reason), so if you need a few others to validate that patch with @hiddeco (and obviously keep enjoying your vacation instead of reading this!) you can check out:

stable/datadog edge/linkerd2

For me both constantly redeploy over and over, with linkerd actually taking down helm-operator as it can't figure out some tap stuff while the pods relaunch.

matmisie commented 4 years ago

I have an issue with one of my custom charts. It installs correctly when I use helm v3 directly.

I have 2 hooks:

Helm operator fails because on the next poll interval it upgrades the release while a first install didnt yet finish? Maybe because the migrate hook is still running?

When i run the helm install process manually (helm install pimcore . -f values.qa2.yaml -n xxx-qa2) and while the install is running i run the upgrade in a loop (helm upgrade pimcore . -f values.qa2.yaml -n xxx-qa2) it doesnt error out. I see this: "Error: UPGRADE FAILED: "pimcore" has no deployed releases". Just after then install finishes it upgrades with success.

I also tried running the install command in another terminal (in a loop) when the helm release was installing for the first time. I got this: "Error: cannot re-use a name that is still in use" while installing and after the install finished.

Did anyone had issues like this?

This is the log:

I0101 23:56:00.298892       6 client.go:87] creating 22 resource(s)
I0101 23:56:00.425264       6 wait.go:51] beginning wait for 22 resources with timeout of 5m0s
I0101 23:56:02.454140       6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:04.447469       6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
…
I0101 23:56:06.449462       6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:08.448750       6 wait.go:199] Deployment is not ready: xxx-qa2/pimcore-nginx. 0 out of 1 expected pods are ready
I0101 23:56:30.504967       6 client.go:87] creating 1 resource(s)
I0101 23:56:30.516273       6 client.go:420] Watching for changes to Job pimcore-install with timeout of 5m0s
I0101 23:56:30.551505       6 client.go:445] Add/Modify event for pimcore-install: MODIFIED
I0101 23:56:30.551678       6 client.go:484] pimcore-install: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
I0101 23:56:43.587963       6 client.go:445] Add/Modify event for pimcore-install: MODIFIED
I0101 23:56:43.593191       6 client.go:220] Starting delete for "pimcore-migrate" Job
I0101 23:56:43.597942       6 client.go:245] jobs.batch "pimcore-migrate" not found
I0101 23:56:43.633058       6 client.go:87] creating 1 resource(s)
I0101 23:56:43.641761       6 client.go:420] Watching for changes to Job pimcore-migrate with timeout of 5m0s
I0101 23:56:43.670773       6 client.go:445] Add/Modify event for pimcore-migrate: MODIFIED
I0101 23:56:43.670799       6 client.go:484] pimcore-migrate: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
I0101 23:56:58.583841       6 client.go:445] Add/Modify event for pimcore-migrate: MODIFIED
I0101 23:56:58.591805       6 client.go:220] Starting delete for "pimcore-install" Job
ts=2020-01-01T23:56:58.659490809Z caller=release.go:254 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 info="Helm release sync succeeded" revision=4335e21a35d48576151c45008b5698826322b229
I0101 23:57:32.710157       6 upgrade.go:79] preparing upgrade for pimcore
I0101 23:57:33.398012       6 upgrade.go:87] performing update for pimcore
ts=2020-01-01T23:57:33.48955898Z caller=release.go:188 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [pimcore]: rendered manifests contain a new resource that already exists. Unable to continue with update: existing resource conflict: kind: PersistentVolume, namespace: , name: pimcore-sharedfs-qa2"
ts=2020-01-01T23:58:57.128901016Z caller=operator.go:309 component=operator info="enqueuing release" resource=xxx-qa2:helmrelease/pimcore
I0101 23:58:59.559191       6 upgrade.go:79] preparing upgrade for pimcore
I0101 23:59:00.067934       6 upgrade.go:87] performing update for pimcore
ts=2020-01-01T23:59:00.129994283Z caller=release.go:188 component=release release=pimcore targetNamespace=xxx-qa2 resource=xxx-qa2:helmrelease/pimcore helmVersion=v3 error="failed to determine if the release should be synced" err="failed to upgrade chart for release [pimcore]: rendered manifests contain a new resource that already exists. Unable to continue with update: existing resource conflict: kind: PersistentVolume, namespace: , name: pimcore-sharedfs-qa2"```

This is from "kubectl describe hr/pimcore":
```Status:
  Conditions:
    Last Transition Time:  2020-01-01T23:55:59Z
    Last Update Time:      2020-01-02T21:04:59Z
    Message:               successfully cloned chart revision: 9de0408637a7b863ecb2b0554e668309e4410327
    Reason:                GitRepoCloned
    Status:                True
    Type:                  ChartFetched
    Last Transition Time:  2020-01-02T14:41:00Z
    Last Update Time:      2020-01-02T21:05:00Z
    Message:               failed to compose values for chart release
    Reason:                HelmUpgradeFailed
    Status:                False
    Type:                  Released
slenky commented 4 years ago

@apenney @hiddeco I still see the same behaviour on 1.0.0-rc6 :(

hiddeco commented 4 years ago

@slenky I haven't worked yet on mentioned problem, the release from yesterday took care of other issues (see release notes). I will get to this one soon.

slenky commented 4 years ago

@hiddeco confirming that latest 1.0.0-rc7 had fixed an issue :) Thank you and good job!

apenney commented 4 years ago

Also confirming that we're looking good, really appreciate the quick fixes! :)

runningman84 commented 4 years ago

The latest rc seems just install prometheus operator during the first run and leaves it untouched afterwards. But I still get this error message:

fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"
fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: 2020/01/10 21:39:34 info: skipping unknown hook: "crd-install"

Can we just ignore this message?

runningman84 commented 4 years ago

I have also found this problem in flux-helm-operator:

fluxcd/helm-operator-69bc6bd556-hdg7s[flux-helm-operator]: ts=2020-01-10T21:47:01.066165659Z caller=release.go:217 component=release release=blackbox-exporter targetNamespace=monitoring resource=monitoring:helmrelease/blackbox-exporter helmVersion=v3 error="Helm release failed" revision=1.6.0 err="failed to upgrade chart for release [blackbox-exporter]: unable to build kubernetes objects from release manifest: error validating \"\": error validating data: ValidationError(PodDisruptionBudget.spec): unknown field \"enabled\" in io.k8s.api.policy.v1beta1.PodDisruptionBudgetSpec"

I do not understand this error my config looks just fine and just to work before using helm2:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: blackbox-exporter
  namespace: monitoring
  annotations:
    fluxcd.io/automated: 'true'
    filter.fluxcd.io/chart-image: semver:~0.15
spec:
  releaseName: blackbox-exporter
  helmVersion: v3
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: prometheus-blackbox-exporter
    version: 2.0.0
  values:
    name: blackbox-exporter
    image:
      tag: v0.15.1
    resources:
      limits:
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 64Mi

I do not see the use of the keyword enabled anywhere in this helm release.

gsf commented 4 years ago

@runningman84 That "enabled" field in PodDisruptionBudget was a Helm V3 incompatibility in prometheus-blackbox-exporter prior to version 2.0.0. You may need to set the content of PodDisruptionBudget as described in the README: https://github.com/helm/charts/tree/master/stable/prometheus-blackbox-exporter#200

rowecharles commented 4 years ago

I can also confirm that 1.0.0-rc7 has fixed my issues with spurious upgrades. Thanks for the great work!

@runningman84 the crd-install hooks are no longer supported in helm 3. I think it's safe to ignore the errors.

smark88 commented 4 years ago

I've taken a look at the latest image (helm-v3-dev-7589ee47) and the operator is continually attempting to upgrade some releases depite there being no changes. An example is:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  releaseName: elasticsearch
  chart:
    repository: https://helm.elastic.co
    name: elasticsearch
    version: 7.4.1
  values:
    replicas: 1

The logs show that the operator thinks there is a difference between the applied and expected chart due to the type mismatch for the value of replicas:

ts=2019-11-24T11:40:22.774901798Z caller=release.go:340 component=release release=elasticsearch targetNamespace=monitoring resource=monitoring:helmrelease/elasticsearch helmVersion=v3 info="values have diverged" diff="  map[string]interface{}{\n  \t... // 39 identical entries\n  \t\"rbac\":           map[string]interface{}{\"create\": bool(false), \"serviceAccountName\": string(\"\")},\n  \t\"readinessProbe\": map[string]interface{}{\"failureThreshold\": float64(3), \"initialDelaySeconds\": float64(10), \"periodSeconds\": float64(10), \"successThreshold\": float64(3), \"timeoutSeconds\": float64(5)},\n- \t\"replicas\":       float64(1),\n+ \t\"replicas\":       s\"1\",\n  \t\"resources\":      map[string]interface{}{\"limits\": map[string]interface{}{\"cpu\": string(\"1000m\"), \"memory\": string(\"2Gi\")}, \"requests\": map[string]interface{}{\"cpu\": string(\"100m\"), \"memory\": string(\"2Gi\")}},\n  \t\"roles\":          map[string]interface{}{\"data\": string(\"true\"), \"ingest\": string(\"true\"), \"master\": string(\"true\")},\n  \t... // 12 identical entries\n  }\n"

The important part being:

- "replicas":       float64(1),
+ "replicas":       s"1",

This doesn't impact any of the running pods but does clog up the release history meaning rollbacks would be impossible

$helm history elasticsearch
REVISION  UPDATED                     STATUS      CHART               APP VERSION DESCRIPTION
10        Sun Nov 24 11:42:23 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
11        Sun Nov 24 11:43:23 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
12        Sun Nov 24 11:44:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
13        Sun Nov 24 11:45:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
14        Sun Nov 24 11:46:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
15        Sun Nov 24 11:47:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
16        Sun Nov 24 11:48:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
17        Sun Nov 24 11:49:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
18        Sun Nov 24 11:50:22 2019    superseded  elasticsearch-7.4.1 7.4.1       Upgrade complete
19        Sun Nov 24 11:51:22 2019    deployed    elasticsearch-7.4.1 7.4.1       Upgrade complete

This has happenned on a number of charts where I've used integers in the values

I have also ran into this and it's rather annoying about every 5 minutes its does a revision.

REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION     
471             Tue Jan 14 21:19:12 2020        superseded      datarepo-0.0.4                  Upgrade complete
472             Tue Jan 14 21:22:10 2020        superseded      datarepo-0.0.4                  Upgrade complete
473             Tue Jan 14 21:25:07 2020        superseded      datarepo-0.0.4                  Upgrade complete
474             Tue Jan 14 21:28:06 2020        superseded      datarepo-0.0.4                  Upgrade complete
475             Tue Jan 14 21:31:05 2020        superseded      datarepo-0.0.4                  Upgrade complete
476             Tue Jan 14 21:34:13 2020        superseded      datarepo-0.0.4                  Upgrade complete
477             Tue Jan 14 21:37:06 2020        superseded      datarepo-0.0.4                  Upgrade complete
478             Tue Jan 14 21:40:10 2020        superseded      datarepo-0.0.4                  Upgrade complete
479             Tue Jan 14 21:43:05 2020        superseded      datarepo-0.0.4                  Upgrade complete
480             Tue Jan 14 21:46:05 2020        deployed        datarepo-0.0.4                  Upgrade complete
hiddeco commented 4 years ago

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

REBELinBLUE commented 4 years ago

Not sure if there is anything to do with 1.0.0-rc7 but I was running from a build of helm-v3-dev for a few weeks, I have changed to 1.0.0-rc7 and now most of the helm releases have a status of "failed to compose values for chart release" even though they are installed fine

For example

❯ helm history velero -n velero
REVISION    UPDATED                     STATUS      CHART           APP VERSION DESCRIPTION
1           Tue Dec 24 16:08:53 2019    deployed    velero-2.7.5    1.2.0       Install complete
❯ kubectl get hr -n velero
NAME     RELEASE   STATUS     MESSAGE                                      AGE
velero   velero    deployed   failed to compose values for chart release   21d
smark88 commented 4 years ago

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

helm-operator:1.0.0-rc5 is the container I have deployed I will try upgrading to 1.0.0-rc7 and see if that resolves it.

runningman84 commented 4 years ago

I have also two releases with status failed to compose values also they are installed using the rc7 version.

hiddeco commented 4 years ago

@runningman84 @REBELinBLUE can you both share a copy of one of HelmReleases causing trouble?

runningman84 commented 4 years ago

I can share one of them which is using a public chart:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: external-dns
  namespace: kube-system
  annotations:
    fluxcd.io/automated: "true"
    filter.fluxcd.io/chart-image: semver:~0.5
spec:
  releaseName: external-dns
  helmVersion: v3
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: external-dns
    version: 2.5.3
  values:
    name: external-dns
    image:
      tag: 0.5.15-debian-9-r1
    rbac:
      create: true
    provider: aws
    #txtOwnerId: k8s-prod-home
    aws:
      accessKey: xxxxxxxxxxxxxxxxxxxxxxxx
      secretKey: yyyyyyyyyyyyyyyyyyyyyyyy
      region: us-east-1

this is the kubectl output:

root@cubi001:~# kubectl get hr --all-namespaces
NAMESPACE     NAME                    RELEASE                 STATUS     MESSAGE                                                                                                                                                                                              AGE
kube-system   external-auth-server    external-auth-server    deployed   Helm release sync succeeded                                                                                                                                                                          19d
kube-system   external-dns            external-dns            deployed   failed to compose values for chart release                                                                                                                                                           19d
hiddeco commented 4 years ago

@runningman84 this works without any issues for me, can you share the Status output from kubectl describe hr?

runningman84 commented 4 years ago

this is the output

kubectl describe hr external-dns -n kube-system
Name:         external-dns
Namespace:    kube-system
Labels:       fluxcd.io/sync-gc-mark=sha256.d2hcYp2rAOHFEhWk8hRzjHP_lI9tPeoShKH4OThrveI
Annotations:  filter.fluxcd.io/chart-image: semver:~0.5
              fluxcd.io/automated: true
              fluxcd.io/sync-checksum: 44ada7e95a8eed235e4e0c2bea067e1a9e5fec95
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"helm.fluxcd.io/v1","kind":"HelmRelease","metadata":{"annotations":{"filter.fluxcd.io/chart-image":"semver:~0.5","fluxcd.io/...
API Version:  helm.fluxcd.io/v1
Kind:         HelmRelease
Metadata:
  Creation Timestamp:  2019-12-26T21:55:32Z
  Generation:          1
  Resource Version:    4630545
  Self Link:           /apis/helm.fluxcd.io/v1/namespaces/kube-system/helmreleases/external-dns
  UID:                 56d5f433-05f4-4e17-a422-c38f93d9e43b
Spec:
  Chart:
    Name:        external-dns
    Repository:  https://kubernetes-charts.storage.googleapis.com/
    Version:     2.5.3
  Helm Version:  v3
  Release Name:  external-dns
  Values:
    Aws:
      Access Key:  xxxxxxxxxxxxxxxxxxxxxxxx
      Region:      us-east-1
      Secret Key:  yyyyyyyyyyyyyyyyyyyyyyyy
    Image:
      Tag:     0.5.15-debian-9-r1
    Name:      external-dns
    Provider:  aws
    Rbac:
      Create:      true
    Txt Owner Id:  k8s-prod-home
Status:
  Conditions:
    Last Transition Time:  2019-12-30T06:49:28Z
    Last Update Time:      2019-12-30T06:49:28Z
    Message:               failed to compose values for chart release
    Reason:                HelmUpgradeFailed
    Status:                False
    Type:                  Released
    Last Transition Time:  2019-12-26T22:49:16Z
    Last Update Time:      2020-01-13T20:46:27Z
    Message:               chart fetched: external-dns-2.5.3.tgz
    Reason:                RepoChartInCache
    Status:                True
    Type:                  ChartFetched
  Observed Generation:     1
  Release Name:            external-dns
  Release Status:          deployed
  Revision:                2.5.3
  Values Checksum:         db74f0cd3f490aff01288488cbc3d9ee8484af151db23ae096a015c59ca20124
Events:
  Type    Reason       Age                 From           Message
  ----    ------       ----                ----           -------
  Normal  ChartSynced  13m (x75 over 37h)  helm-operator  Chart managed by HelmRelease processed
smark88 commented 4 years ago

@smark88 you are running an old version, helm-v3-dev has been merged into master, and the latest version with Helm 3 support (and bug fixes) is 1.0.0-rc7.

helm-operator:1.0.0-rc5 is the container I have deployed I will try upgrading to 1.0.0-rc7 and see if that resolves it.

I redeployed last night and the 1.0.0-rc5 --> 1.0.0-rc7 has resolved my issue. Typically by this time in the morning I would have about 100+ revisions.

NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                   APP VERSION
dev-jade        dev             1               2020-01-14 23:08:11.235655799 +0000 UTC deployed        datarepo-0.0.4                                     
secrets         dev             1               2020-01-14 23:08:13.113370804 +0000 UTC deployed        create-secret-manager-secret-0.0.4 
REBELinBLUE commented 4 years ago

@hiddeco https://github.com/REBELinBLUE/k8s-on-hypriot/blob/master/deployments/velero/velero/velero.yaml but most of the charts in the deployments folder are showing the status, the only ones which aren't are the ones I removed with kubectl delete hr <name> and then reapplied

hiddeco commented 4 years ago

@REBELinBLUE @runningman84 does one of you still have access to logs from around the time the status message was pushed for the release?