Closed BenjaminSchiborr closed 5 years ago
@BenjaminSchiborr could you provide more details about what happens? There are a couple of cases in which Helmsman decides to delete and reinstall the helm release because: a) it sees a change in the chart used (e.g. changing the chart name or the name in Chart.yaml not matching the chart directory name) or b) the desired namespace has been changed in the DSF .. this results in two operations: delete and install and the decision logged will be clear about reinstalling rather than upgrading (with the --verbose
flag you will see the actual two commands also logged) .. this is not related to the --force
flag used in the helm upgrade command.
If in your case, you don't find that Helmsman decides to reinstall your release, then it could be some other issue and more context (your DSF and helmsman logs) would be useful to identify it.
FYI, the helm upgrade --force
flag does not delete helm releases but might delete k8s resources if that's necessary to upgrade/update them. On the other hand, helmsman would delete the helm release and install it again in the cases mentioned above.
@sami-alajrami, thank you for the quick response and sorry for not providing any specifics in the initial message.
A) and B) are definitely not the case here. I haven't switched namespaces or chart names in a quite a while.
Below is one example of a STS that was effectively fully stopped and redeployed. The difference in the DSF was just the version of the chart. The delta in between versions were changes to an init container (Please see the color coded lines below). Some values are passed in to helm via "set" or "setString". However, they did not change in between deploys.
2019/07/22 09:32:26 VERBOSE: helm diff upgrade c1-kafka-green helm-charts/kafka --version "0.6.410" --set [...] --tiller-namespace kube-system
[33mana-kafka, c1-kafka-green, Job (batch) has changed:[0m
# Source: kafka/templates/pre-install-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "c1-kafka-green"
labels:
app: "c1-kafka-green"
app.kubernetes.io/name: c1-kafka-green
app.kubernetes.io/instance: c1-kafka-green
app.kubernetes.io/managed-by: Tiller
[31m- helm.sh/chart: kafka-0.6.402[0m
[32m+ helm.sh/chart: kafka-0.6.410[0m
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
metadata:
name: "c1-kafka-green"
labels:
app: "c1-kafka-green"
app.kubernetes.io/name: c1-kafka-green
app.kubernetes.io/instance: c1-kafka-green
app.kubernetes.io/managed-by: Tiller
[31m- helm.sh/chart: kafka-0.6.402[0m
[32m+ helm.sh/chart: kafka-0.6.410[0m
spec:
restartPolicy: Never
containers:
- name: pre-install-job-0
image: "byrnedo/alpine-curl"
command:
- [...]
[33mana-kafka, c1-kafka-green-svc, Service (v1) has changed:[0m
# Source: kafka/templates/headless-service.yaml
apiVersion: v1
kind: Service
metadata:
name: "c1-kafka-green-svc"
annotations:
prometheus.io/port: "9010"
prometheus.io/scrape: "true"
labels:
app: "c1-kafka-green"
app.kubernetes.io/name: "c1-kafka-green"
app.kubernetes.io/instance: c1-kafka-green
app.kubernetes.io/managed-by: Tiller
[31m- helm.sh/chart: kafka-0.6.402[0m
[32m+ helm.sh/chart: kafka-0.6.410[0m
spec:
clusterIP: None
ports:
- port: 9092
name: broker
- port: 9010
name: jmx
selector:
app: "c1-kafka-green"
[33mana-kafka, c1-kafka-green, StatefulSet (apps) has changed:[0m
# Source: kafka/templates/statefulset.yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: "c1-kafka-green"
spec:
serviceName: "c1-kafka-green-svc"
replicas: 6
updateStrategy:
type: OnDelete
template:
metadata:
labels:
app: "c1-kafka-green"
app.kubernetes.io/name: "c1-kafka-green"
app.kubernetes.io/instance: c1-kafka-green
app.kubernetes.io/managed-by: Tiller
[31m- helm.sh/chart: kafka-0.6.402[0m
[32m+ helm.sh/chart: kafka-0.6.410[0m
annotations:
values_version: b2ad53f211a821e974483f34889eae992602d503c86018de59da40a7a1db72ea
pod.alpha.kubernetes.io/initialized: "true"
spec:
volumes:
- emptyDir: {}
name: [...]
securityContext:
fsGroup: 1000
imagePullSecrets:
- name: awsregistry
nodeSelector:
role: kafka
tolerations:
- effect: NoSchedule
operator: "Equal"
key: "role"
value: kafka
- effect: NoSchedule
key: "dedicated"
operator: "Equal"
value: "storage_optimized"
initContainers:
- command:
- [...]
image: docker.io/appdynamics/java-agent:latest
imagePullPolicy: IfNotPresent
[32m+ securityContext:[0m
[32m+ runAsUser: 1000[0m
name: appd-agent-attach
resources:
limits:
cpu: 200m
memory: 75M
requests:
cpu: 100m
memory: 50M
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/temp
name: [...]
affinity:
nodeAffinity:
# NodeAffinity here is needed to prefer the last node Pod ran on. This
# is needed if PV is through hostPath (which is the case for test envs).
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: component
operator: In
values:
- "c1-kafka-green"
podAntiAffinity:
# Antiaffinity is needed to spread out pods. This replaces hostPort
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: "c1-kafka-green"
containers:
- name: "c1-kafka-green"
image: [...]
command: [...]
volumeMounts:
- name: data-volume
mountPath: /data
- name: [...]
mountPath: [...]
# imagePullPolicy: Always
securityContext:
runAsUser: 1000
args:
- [...]
ports:
- containerPort: 9092
name: broker
- containerPort: 9010
name: jmxport
resources:
requests:
cpu: "7.7"
memory: "28800M"
env:
- name: [...]
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
storageClassName: "gp2"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2048Gi
As you can see below the readiness state of the StatefulSet was 0/6. This is because all pods were terminated in parallel. I was trying to protect against that behavior through the "OnDelete" update strategy. However, this does not seem to work. This is why I was suspecting the "--force" flag. Any hints on to what the issue could be are greatly appreciated. :)
2019/07/22 09:32:50 VERBOSE: helm upgrade c1-kafka-green helm-charts/kafka --version "0.6.410" --force --set kafka.instances="6" --set [...] --tiller-namespace kube-system
2019/07/22 09:32:53 [36mRelease "c1-kafka-green" has been upgraded. Happy Helming!
LAST DEPLOYED: Mon Jul 22 09:32:52 2019
NAMESPACE: ana-kafka
STATUS: DEPLOYED
RESOURCES:
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
c1-kafka-green-svc ClusterIP None <none> 9092/TCP,9010/TCP 19d
==> v1beta1/StatefulSet
NAME READY AGE
c1-kafka-green 0/6 0s
[0m
2019/07/22 09:32:53 INFO: applying Helmsman labels to [ c1-kafka-green ] in namespace [ ana-kafka ]
2019/07/22 09:32:53 VERBOSE: kubectl label configmap -n kube-system -l NAME=c1-kafka-green MANAGED-BY=HELMSMAN NAMESPACE=ana-kafka TILLER_NAMESPACE=kube-system --overwrite```
This might be related to a missing field in the StatefulSet. I'll update this topic in a little bit.
The issue is caused by the selector missing in both Deployments and StatefulSets. This leads to an error message of the following kind:
Error: Deployment.apps "jf-green" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/name":"jf-green", "helm.sh/chart":"jf-0.6.411", "target":"green", "app":"jf-green", "app.kubernetes.io/instance":"jf-green", "app.kubernetes.io/managed-by":"Tiller"}: `selector` does not match template `labels`
Error: UPGRADE FAILED: Deployment.apps "jf-green" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/name":"jf-green", "helm.sh/chart":"jf-0.6.411", "target":"green", "app":"jf-green", "app.kubernetes.io/instance":"jf-green", "app.kubernetes.io/managed-by":"Tiller"}: `selector` does not match template `labels`
This error message is only visible if not forcing the upgrade via helm. If one forces the upgrade the error message is swallowed and the resources are redeployed - which can lead to data loss.
This brings me back to my original question:
Shouldn't --force
be made optional? If one were to do above mistake this could lead to serious data loss and would not be caught by any safeguards.
thanks @BenjaminSchiborr for digging more into this .. yes, it makes sense to make forcing upgrades optional!
@BenjaminSchiborr the helm force option is made optional (and not used by default) in PR #273 .. could you please test it with the docker image praqma/helmsman:pr-273
@sami-alajrami : Thank you for the quick PR. It looks like the pr-273 image does not contain your changes. "--force" is still added per default. "helmsman --help" also does not show the "force-upgrades" option.
@BenjaminSchiborr sorry about that, I must have built the wrong branch locally then. Now that the PR is merged, you can test with praqma/helmsman:latest
instead.
@sami-alajrami : Looks good to me. One of my recent deploys warned me that there was an immutable service. I had to use --force-upgrades
to resolve it.
I have a question regarding how Helmsman is handling the deployment of resources. When updating existing resources (deployments, but especially StatefulSets) Helmsman somtimes seems to be deleting and recreating the resources. This can lead to data loss, because it prevents RollingUpdates, or OnDelete updates.
I have not yet spent time on figuring out the exact circumstances when the deletion and recreation is triggered.
However, this behavior seems to be rooted in the use of the flag "--force" for every helm upgrade call, as seen here: https://github.com/Praqma/helmsman/blob/master/decision_maker.go#L285
According to the helm documentation: "Force resource update through delete/recreate if needed". (https://github.com/helm/helm/blob/master/docs/helm/helm_upgrade.md)
Shouldn't this be an optional flag? By default one would not want an upgrade to be forced, but instead would expect it to fail in case an application couldn't be upgraded.