Closed nishaad78 closed 4 years ago
@nishaad78 please make sure your deployment doesn't have replicas
filed in the spec as this makes Flux fight the HPA controler.
This is my deployment spec (no replicas
):
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
namespace: nishaad-test
labels:
app: podinfo
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: podinfo
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"
prometheus.io/scrape: "true"
prometheus.io/port: "9797"
labels:
app: podinfo
spec:
initContainers:
- name: init
image: alpine:3.10.1
command:
- sleep
- "1"
containers:
- name: podinfod
image: stefanprodan/podinfo:3.1.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 9898
protocol: TCP
- name: http-metrics
containerPort: 9797
protocol: TCP
- name: grpc
containerPort: 9999
protocol: TCP
command:
- ./podinfo
- --port=9898
- --port-metrics=9797
- --grpc-port=9999
- --grpc-service-name=podinfo
- --level=info
- --random-delay=false
- --random-error=false
env:
- name: PODINFO_UI_COLOR
value: "#34577c"
livenessProbe:
httpGet:
path: /healthz
port: 9898
readinessProbe:
httpGet:
path: /readyz
port: 9898
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 100m
memory: 64Mi
How did you triggered the canary? Have you used kubectl or did you changed the deployment in git? If you used kubectl then Flux reverted the change.
Canary was triggered when Flux detected changes in git. What's the expected scaling behaviour during a canary analysis? Is it supposed to lock the number of replicas?
What's the expected scaling behaviour during a canary analysis? Is it supposed to lock the number of replicas?
No, Flagger doesn't take into account changes to the replicas
field, so HPA shouldn't restart the analysis.
Ah I think I know what's going on, Flagger generates the ClusterIP and Flux overrides it, can you please remove the service manifest from git and try once more.
You're right. Removing Service
and VirtualService
from my git solved this issue. However, I think this is just a workaround since I would still want those definitions in git.
Why would you want those in git? Every time Flux applies them, the routing will be broken until Flagger corrects it. The Canary resource contains a service section that defines the desired state of the app routing including cluster ips and istio services/destination rules.
Thanks for your quick responses.
We use helm templates to generate a manifest for our mircoservices and rely on CD to deploy all the generated components, including Service
and VirtualService
.
Our manifests are a source for declarative and reproducible deployments. Therefore, I was expecting canary analysis to not be disrupted when Flux syncs with git.
Seems like this is not something you plan on supporting?
You can place the canary definition inside the chart and add a toggle to disable the services when canary is enabled.
I'm using istio 1.4 on K8s 1.13 with flagger v1.0.0 and here's my canary spec:
When canary weight changes during the rollout, this triggers hpa to change the number of replicas accordingly and I see that flagger restarts the canary analysis. This is from flagger logs:
I also noticed that when restarting canary analysis, it is not taking into account the
stepWeightPromotion
. This can be a problem if the primary replica has scaled down during the canary analysis phase. I'll create another ticket for that.