Closed nextrevision closed 3 years ago
We are seeing the same issue:
Please, fix this issue. The webmetric_test.go file only has scenarios where webServerStatus == 200, but we are getting 404 status code responses from the service (webhook) and the rollout is being considered as "Healthy".
@jessesuen Any ideas when this will get worked on? This seems like kind of a non-starter for people that want automatic rollback on deployments
If someone could give me a hint on where to start, I'd be willing to try and contribute!
Following up here. It appears that setting failureLimit
to 0 caused the rollout to cancel back. Someone had posted a similar question in the Argo Rollouts Slack and Jesse mentioned it. Maybe others here should try explicitly setting that value to 0?
There's been a lot of issues I've been working through with blue-green and analysis, in that it was basically broken. I feel like this issue may be covered by the v0.9.2 work in progress, as I have been focusing a lot on blue-green + analysis issues.
v0.9.2 is released with many fixes to blue-green in conjunction with analysis. I believe the issue is resolved but please reopen if still an issue.
This is to confirm the issue was resolved:
@jessesuen I am facing the same issue. My version is: Image: argoproj/argo-rollouts:v0.10.2
@jessesuen I'm seeing this happen with the release candidate v1.0.0-rc1
with helm chart version 0.5.0
.
Same issue with release candidate 1.2.0-rc2
, does someone have any hint on this problem ?
We haven't seen this issue again. A failed rollout always goes into a "Degraded" status if it fails the AnalysisRun. We are also running the same version.
Ok I see. I am currently investigating on this. I am most likely doing something wrong, but my job fails and AnalysisRun
stay healthy instead of degrading my rollout. Maybe something to do with my istio sidecar injection container I guess.
Here is my ArgoCD result and you can see the job/pod in failed status but AnalysisRun
healthy
No worries, if AnalysisRun STATUS was "Error", then the rollout STATUS should be "Degraded", but if it was "Successful", I would recommend you look for something else down stream.
I tried to disable Istio sidecar injection and still same behaviour :
The AnalysisRun
status is successful and I don't understand why actually
Your pictures show jobs failing. Is your AnalysisTemplate provider a job? From here: https://argoproj.github.io/argo-rollouts/analysis/job/? I would double check the job command exit code too, just in case.
Indeed my AnalysisTemplate
provider is a job. I will double check the exit code first
ok, I figured it out. Seemed like I used a wrong image to execute my job. I was using curlimages/curl:latest
at first and then replacing that image with a different one from my personal library worked. Anyway, thanks a lot for your help dude !
EDIT: Actually I was wrong... The problem was not the image but the options I put into my job. Adding these options:
ttlSecondsAfterFinished: 1000
activeDeadlineSeconds: 120
caused my strange behaviour of failed job with successful AnalysisRun
. So maybe a bug here
RE EDIT:
ok sorry for saying bullshit. I finally understood the true reason. my count
was equal to 1
and my failureLimit
also equal to 1
. You need count
> failureLimit
to make it work.
Anyway...It's late Im tired and I should have had gone to bed instead of saying non sense. Maybe it will help someone :)
Good night
I have the same behaviour:
Name: prometheus-metrics-success-rate Phase: Successful Count: 1 Failed: 1 Measurements: Finished At: 2022-03-15T14:28:05Z Phase: Failed Started At: 2022-03-15T14:28:05Z Value: [NaN]
My failureLimit is 1 for each AnaysisTemplate and my count is 3 or each AnaysisTemplate too.
I'm using the Job metric provider for pre-promotion validation in a b/g scenario. The job results in failure (expected) but the analysis run still reports
Successful
. I expect the analysis run to also fail and cause the revision to be ineligible for promotion (automated or manual) unless otherwise ignored. If I setautoPromotionEnabled
totrue
on my Rollout, the revision with the failed Job will be promoted automatically.Rollout Status
Analysis Template
Rollout