argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.66k stars 836 forks source link

Parameterized count/interval not working for inline analyses #3243

Open rbrendler opened 8 months ago

rbrendler commented 8 months ago

Checklist:

Describe the bug

Overriding parameterized count/interval in templates does not seem to work for inline analyses

To Reproduce

I have a series of ClusterAnalysisTemplates that allow configuration of count and interval as arguments, as such:

apiVersion: argoproj.io/v1alpha1
kind: ClusterAnalysisTemplate
metadata:
  name: p95-latency
spec:
    args:
      - name: count
        value: "0"
      - name: interval
        value: "30s"
    metrics:
      - name: p95-latency
        count: "{{args.count}}"
        interval: "{{args.interval}}"
    ...

These are primarily intended to run as background analyses, so I set default value for count to 0 (run indefinitely). These work as expected, and I am able to override count and interval in the rollout for cases where I do not want continuous analysis.

When I try to use these templates for an inline analysis I cannot use a continuous check, so I need to override the count, as such:

... 
strategy:
    canary:
      steps:
      - setWeight: 20
      - analysis:
         args:
         - name: count
           value: "4"
         templates:
         - clusterScope: true
           templateName: p95-latency
...

When I deploy this rollout, I get the following error in the resource state:

InvalidSpec: The Rollout "rollouts-demo" is invalid: spec.strategy.canary.steps[2].analysis.templates: Invalid value: "p95-latency": AnalysisTemplate p95-latency has metric p95-latency which runs indefinitely. Invalid value for count: 0

Expected behavior

I expect the inline analysis to behave like the background analysis, and override the default count with the argument passed in.

Version

Operator: v1.6.0+7eae71e Chart: 2.32.2

Logs

Repeats the same 4 lines over and over:

time="2023-12-11T17:03:47Z" level=info msg="Started syncing rollout" generation=109 namespace=p44-qa-integration resourceVersion=1833103636 rollout=rollouts-demo
time="2023-12-11T17:03:47Z" level=info msg="ComputePodTemplateHash hash changed (expected: 6dffcffb69, actual: 6cf78c66c5)" namespace=p44-qa-integration rollout=rollouts-demo
time="2023-12-11T17:03:47Z" level=error msg="The Rollout \"rollouts-demo\" is invalid: spec.strategy.canary.steps[1].analysis.templates: Invalid value: \"p95-latency\": AnalysisTemplate p95-latency has metric p95-latency which runs indefinitely. Invalid value for count: 0" namespace=p44-qa-integration rollout=rollouts-demo
time="2023-12-11T17:03:47Z" level=info msg="Reconciliation completed" generation=109 namespace=p44-qa-integration resourceVersion=1833103636 rollout=rollouts-demo time_ms=2.5524229999999997

Message from the maintainers:

Impacted by this bug? Give it a šŸ‘. We prioritize the issues with the most šŸ‘.

Ahmed-Elkollaly commented 5 months ago

I'm also impacted by this bug. Are there any updates?