argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.04k stars 3.2k forks source link

MetricsError: Failed to resolve {{exitCode}} when Workflow submitted by Argo-Events #12080

Open leoleonsio opened 1 year ago

leoleonsio commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

Problem

{{exitCode}} is not resolved properly in a Workflow metric when the workflow is triggered as a result of an event. The same metric is emitted correctly when the workflow is submitted manually.

Expected

The {{exitCode}} value should be resolved in a Workflow metric when the workflow is triggered as a result of an event.

Some details

From the Workflow resource, after the workflow completes successfully:

  - message: 'unable to substitute parameters for metric ''latest_workflow_status'':
      failed to resolve {{exitCode}}'
    status: "True"
    type: MetricsError

Example setup

Sensor:

apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: test-sensor
spec:
  eventBusName: argo-events-events
  dependencies:
    - name: dependency-name
      eventSourceName: test-eventsource
      eventName: event-name
  triggers:
    - template:
        name: event-trigger
        k8s:
          operation: create
          source:
            resource:
              apiVersion: argoproj.io/v1alpha1
              kind: Workflow
              metadata:
                generateName: test-workflow-
                namespace: test-namespace
              spec:
                workflowTemplateRef:
                  name: test-workflowtemplate

WorkflowTemplate:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: test-workflowtemplate
spec:
  entrypoint: main
  metrics:
    prometheus:
      - name: latest_workflow_status
        help: "Exit code of the last triggered workflow in the namespace"
        labels:
          - key: workflow_namespace
            value: "{{ workflow.namespace }}"
        gauge:
          value: "{{exitCode}}"
  templates:
    - name: main
      script:
        image: debian:bullseye-slim
        command: [bash]
        source: |
          echo "Sleeping"
          sleep 2
          exit 0

When submitting a Workflow with a metric like this manually, the metric is successfully emitted and is not mentioned in the logs.

Version

Workflows v3.5.0 and events v1.8.1

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

The issue cannot be reproduced by a single Workflow. It only occurs when a Workflow is triggered by a Sensor as a result of an event captured by the EventSource. See example setup in the main issue description.

Logs from the workflow controller

time="2023-10-25T10:20:57.412Z" level=info msg="node test-workflow-5nk9k phase Running -> Succeeded" namespace=*** workflow=test-workflow-5nk9k
time="2023-10-25T10:20:57.412Z" level=info msg="node test-workflow-5nk9k finished: 2023-10-25 10:20:57.412399446 +0000 UTC" namespace=*** workflow=test-workflow-5nk9k
time="2023-10-25T10:20:57.412Z" level=error msg="unable to substitute parameters for metric 'latest_workflow_status': failed to resolve {{exitCode}}" namespace=*** workflow=test-workflow-5nk9k

Logs from in your workflow's wait container

-
michaelkorofiverr commented 1 week ago

any updates here?

tooptoop4 commented 53 minutes ago

would be great if u can share the generated workflow manifest where it works (not from events) and the generated workflow manifest where it doesn't work (from events)