Closed Boes-man closed 3 years ago
9m16s Normal Started pod/podinfo-primary-cf54546c6-svqn9 Started container podinfod 9m17s Normal SuccessfulCreate replicaset/podinfo-primary-cf54546c6 Created pod: podinfo-primary-cf54546c6-99v9d 9m17s Normal SuccessfulCreate replicaset/podinfo-primary-cf54546c6 Created pod: podinfo-primary-cf54546c6-svqn9 9m17s Normal ScalingReplicaSet deployment/podinfo-primary Scaled up replica set podinfo-primary-cf54546c6 to 2 8m16s Warning FailedGetResourceMetric horizontalpodautoscaler/podinfo-primary did not receive metrics for any ready pods 8m16s Warning FailedGetResourceMetric horizontalpodautoscaler/podinfo-primary failed to get cpu utilization: did not receive metrics for any ready pods 8m16s Warning FailedComputeMetricsReplicas horizontalpodautoscaler/podinfo-primary failed to compute desired number of replicas based on listed metrics for Deployment/test/podinfo-primary: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods 11m Normal ScalingReplicaSet deployment/podinfo Scaled up replica set podinfo-99dc84b6f to 1 7m43s Normal SuccessfulRescale horizontalpodautoscaler/podinfo New size: 2; reason: Current number of replicas below Spec.MinReplicas 11m Normal ScalingReplicaSet deployment/podinfo Scaled up replica set podinfo-99dc84b6f to 2 6m55s Warning FailedGetResourceMetric horizontalpodautoscaler/podinfo did not receive metrics for any ready pods 6m55s Warning FailedGetResourceMetric horizontalpodautoscaler/podinfo failed to get cpu utilization: did not receive metrics for any ready pods 6m56s Warning FailedComputeMetricsReplicas horizontalpodautoscaler/podinfo failed to compute desired number of replicas based on listed metrics for Deployment/test/podinfo: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods 8m47s Normal Synced canary/podinfo all the metrics providers are available! 9m17s Warning Synced canary/podinfo podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation 9m7s Warning Synced canary/podinfo podinfo-primary.test not ready: waiting for rollout to finish: 0 of 2 updated replicas are available 8m57s Warning Synced canary/podinfo podinfo-primary.test not ready: waiting for rollout to finish: 1 of 2 updated replicas are available 8m47s Normal ScalingReplicaSet deployment/podinfo Scaled down replica set podinfo-99dc84b6f to 0
I tried updating the e2e test to Gloo v1.8 and indeed routing is broken. Works fine with Gloo v1.6.
Thanks @stefanprodan its still not working? I cloned the flagger repo and kicked off flagger/test/gloo/run.sh on a fresh cluster. This fails at flagger install.
NOTES:
Flagger installed
deployment.apps/flagger image updated
Waiting for deployment "flagger" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "flagger" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "flagger" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "flagger" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "flagger" rollout to finish: 0 of 1 updated replicas are available...
gloo-system flagger-79ff7c8b8b-fnhpw 0/1 ImagePullBackOff 0 2m2s
❯ kubectl -n gloo-system describe po/flagger-79ff7c8b8b-fnhpw
Name: flagger-79ff7c8b8b-fnhpw
Namespace: gloo-system
Priority: 0
Node: microk8s-vm/192.168.64.2
Start Time: Wed, 25 Aug 2021 21:32:49 +1000
Labels: app.kubernetes.io/instance=flagger
app.kubernetes.io/name=flagger
pod-template-hash=79ff7c8b8b
Annotations: appmesh.k8s.aws/sidecarInjectorWebhook: disabled
cni.projectcalico.org/podIP: 10.1.254.75/32
cni.projectcalico.org/podIPs: 10.1.254.75/32
prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Pending
IP: 10.1.254.75
IPs:
IP: 10.1.254.75
Controlled By: ReplicaSet/flagger-79ff7c8b8b
Containers:
flagger:
Container ID:
Image: test/flagger:latest
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Command:
./flagger
-log-level=info
-mesh-provider=gloo
-metrics-server=http://flagger-prometheus:9090
-enable-config-tracking=true
-slack-user=flagger
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 10m
memory: 32Mi
Liveness: exec [wget --quiet --tries=1 --timeout=4 --spider http://localhost:8080/healthz] delay=0s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [wget --quiet --tries=1 --timeout=4 --spider http://localhost:8080/healthz] delay=0s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-55xlg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-55xlg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m52s default-scheduler Successfully assigned gloo-system/flagger-79ff7c8b8b-fnhpw to microk8s-vm
Normal Pulling 49s (x4 over 2m51s) kubelet Pulling image "test/flagger:latest"
Warning Failed 45s (x4 over 2m28s) kubelet Failed to pull image "test/flagger:latest": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/test/flagger:latest": failed to resolve reference "docker.io/test/flagger:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 45s (x4 over 2m28s) kubelet Error: ErrImagePull
Warning Failed 33s (x6 over 2m28s) kubelet Error: ImagePullBackOff
Normal BackOff 18s (x7 over 2m28s) kubelet Back-off pulling image "test/flagger:latest"
❯
❯
❯
❯ glooctl version
Client: {"version":"1.8.8"}
Server: {"type":"Gateway","kubernetes":{"containers":[{"Tag":"1.8.9","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.8.9","Name":"gloo","Registry":"quay.io/solo-io"},{"Tag":"1.8.9","Name":"gateway","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}
Use ghcr.io/fluxcd/flagger:1.13.0
, this release fixes the issues with Gloo 1.8.
@stefanprodan, for running the repo e2e test (flagger/test/gloo/run.sh) I commented out the #kubectl -n gloo-system set image deployment/flagger flagger=test/flagger:latest
from install.sh then it works. In the tut docs helm upgrade -i flagger flagger/flagger
still installs 1.12.x which fails. Adding --set image.tag=1.13.0
does fix it.
FYI, also noticed that using kubectl -n test describe canary/podinfo
for the "Automated rollback" section doesn't post the events as expected. Tailing the flagger pod logs does however show the events k -n gloo-system logs flagger-55f9868585-xq5wg -f
Thanks
Describe the bug
Applying the image update does not cause progressive rollout. kubectl get canaries -Aw NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME test podinfo Progressing 0 2021-08-13T07:03:14Z test podinfo Progressing 5 2021-08-13T07:03:44Z test podinfo Progressing 5 2021-08-13T07:03:54Z test podinfo Progressing 5 2021-08-13T07:04:04Z test podinfo Progressing 5 2021-08-13T07:04:14Z test podinfo Progressing 5 2021-08-13T07:04:24Z test podinfo Progressing 5 2021-08-13T07:04:34Z test podinfo Failed 0 2021-08-13T07:04:44Z
Events: Type Reason Age From Message
Warning Synced 6m2s flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Warning Synced 5m52s flagger podinfo-primary.test not ready: waiting for rollout to finish: 0 of 2 updated replicas are available Normal Synced 5m42s (x3 over 6m2s) flagger all the metrics providers are available! Normal Synced 5m42s flagger Initialization done! podinfo.test Normal Synced 3m42s flagger New revision detected! Scaling up podinfo.test Warning Synced 3m32s flagger canary deployment podinfo.test not ready: waiting for rollout to finish: 0 of 2 updated replicas are available Warning Synced 3m22s flagger canary deployment podinfo.test not ready: waiting for rollout to finish: 1 of 2 updated replicas are available Normal Synced 3m12s flagger Starting canary analysis for podinfo.test Normal Synced 3m12s flagger Pre-rollout check acceptance-test passed Normal Synced 3m12s flagger Advance podinfo.test canary weight 5 Warning Synced 2m22s (x5 over 3m2s) flagger Halt advancement no values found for gloo metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found Warning Synced 2m12s flagger Rolling back podinfo.test failed checks threshold reached 5 Warning Synced 2m12s flagger Canary failed! Scaling down podinfo.test
To Reproduce
on macOS microk8s install --cpu 4 --mem 8 -y microk8s enable dns rbac storage metallb:192.168.64.50-192.168.64.100 Use tut
Did port-forward to flagger-prometheus and its up.
Expected behavior
kubectl -n test describe canary/podinfo
Status: Canary Weight: 0 Failed Checks: 0 Phase: Succeeded Events: Type Reason Age From Message
Normal Synced 3m flagger New revision detected podinfo.test Normal Synced 3m flagger Scaling up podinfo.test Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available Normal Synced 3m flagger Advance podinfo.test canary weight 5 Normal Synced 3m flagger Advance podinfo.test canary weight 10 Normal Synced 3m flagger Advance podinfo.test canary weight 15 Normal Synced 2m flagger Advance podinfo.test canary weight 20 Normal Synced 2m flagger Advance podinfo.test canary weight 25 Normal Synced 1m flagger Advance podinfo.test canary weight 30 Normal Synced 1m flagger Advance podinfo.test canary weight 35 Normal Synced 55s flagger Advance podinfo.test canary weight 40 Normal Synced 45s flagger Advance podinfo.test canary weight 45 Normal Synced 35s flagger Advance podinfo.test canary weight 50 Normal Synced 25s flagger Copying podinfo.test template spec to podinfo-primary.test Warning Synced 15s flagger Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available Normal Synced 5s flagger Promotion completed! Scaling down podinfo.test
Additional context
Have tried it on KinD and 3 node GKE cluster too, same result. I suspect the loadtester(generator) isn't working?
helm ls -A WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/danwessels/.kube/config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/danwessels/.kube/config NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION flagger gloo-system 1 2021-08-13 16:52:35.827309 +1000 AEST deployed flagger-1.12.1 1.12.1
gloo gloo-system 1 2021-08-13 16:51:37.555865 +1000 AEST deployed gloo-1.8.6