argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.93k stars 3.18k forks source link

3.4+ log propagation issue -- need larger fluentbit buffer size #13314

Closed vasim-test closed 2 months ago

vasim-test commented 3 months ago

Pre-requisites

What happened/what did you expect to happen?

We are using argo workflow with v3.4.0(0.18.0 - Chart version). Our kubernetes version is 1.25.1.

For our applications, We have Cronworkflow, Workflowtemplate object. In Workflowtemplate, We are using annotations to send logs to fluentbit and also Vault is running as side car container.

When we upgrade from 3.3.2 to 3.4.0, We observed the strange issue that logs are not getting visible on elasticsearch/kibana. Previously it was working. There must be some change w.r.t to emissary which is causing this trouble. Also when I stop vault as side car container then logs are coming to kibana but it is not possible for us to stop using vault.

I shared you my cronworkflow, workflowtemplate for reference.

Please check on priority because this is bug which is there on all versions after > 3.4

Version

v3.4.0, v3.5.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

# Workflowtemplate -

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  annotations:
    meta.helm.sh/release-name: test-migration-job  # Updated release-name
    meta.helm.sh/release-namespace: dev
  labels:
    app.kubernetes.io/managed-by: Helm
  name: test-migration-v1-template  # Updated name
  namespace: dev
spec:
  activeDeadlineSeconds: 28800
  arguments:
    parameters:
    - name: script
  templates:
  - container:
      command:
      - /bin/bash
      - '{{`{{inputs.parameters.script}}`}}'
      image: 123456789.dkr.ecr.ap-south-1.amazonaws.com/test-migration:1.0.97886  # Updated image
      name: test-migration-v1  # Updated container name
      resources: {}
      volumeMounts:
      - mountPath: /data
        name: shared-data
    initContainers:
    - env:
      - name: APP_NAME
        value: test-migration  # Updated environment variable
      - name: PROFILE
        value: nonprod-dev
      - name: VERSION
        value: master
      - name: MY_SERVER_URL
        value: http://my-internal-service.dev.svc.cluster.local/  # Updated service URL
      image: 123456789.dkr.ecr.ap-south-1.amazonaws.com/my-internal-client:latest  # Updated image
      name: my-internal-client  # Updated initContainer name
      resources: {}
      volumeMounts:
      - mountPath: /data
        name: shared-data
    inputs:
      parameters:
      - name: script
    metadata:
      annotations:
        co.elastic.logs/enabled: "true"
        co.elastic.logs/json.add_error_key: "true"
        co.elastic.logs/json.keys_under_root: "true"
        co.elastic.logs/json.message_key: log
        fluentbit.job.name: job-test-migration-v1  # Updated fluentbit job name
        vault.hashicorp.com/agent-image: 123456789.dkr.ecr.ap-south-1.amazonaws.com/vault:1.6.2  # Updated image
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-secret-secrets.txt: kv/apps/dev/test-migration  # Updated secret path
        vault.hashicorp.com/agent-inject-template-secrets.txt: |
          {{ with secret "kv/apps/dev/test-migration" }}{{ range $k, $v := .Data.data }}
          {{ $k }}={{ $v }}{{ end }}{{ end }}
        vault.hashicorp.com/role: dev
    name: test-migration-v1-template  # Updated template name
    outputs: {}
    securityContext:
      fsGroup: 65534
    serviceAccountName: test-migration-v1  # Updated service account name
    volumes:
    - emptyDir: {}
      name: shared-data
  ttlStrategy:
    secondsAfterCompletion: 86400

---
# Cronworkflow -

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  annotations:
    meta.helm.sh/release-name: test-migration-job  # Updated release-name
    meta.helm.sh/release-namespace: dev
  labels:
    app.kubernetes.io/instance: test-migration-job  # Updated instance label
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argo-jobs
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: argo-jobs-0.1.0
  name: argo-job-test-migration-v1  # Updated job name
  namespace: dev
spec:
  activeDeadlineSeconds: 28800
  concurrencyPolicy: Forbid
  podGC:
    strategy: Never
  schedule: '*/15 0-2 * * *'
  selector:
    matchLabels:
      app.kubernetes.io/instance: test-migration-job  # Updated instance label
      app.kubernetes.io/name: argo-jobs
  ttlStrategy:
    secondsAfterCompletion: 86400
  workflowSpec:
    activeDeadlineSeconds: 43200000
    entrypoint: dag
    synchronization:
      mutex:
        name: test-migration-v1  # Updated mutex name
    templates:
    - dag:
        tasks:
        - arguments:
            parameters:
            - name: script
              value: /run-test.sh  # Updated script value
          name: test-details-store
          templateRef:
            name: test-migration-v1-template  # Updated templateRef name
            template: test-migration-v1-template  # Updated template name
      name: dag
---
# Workflow-controller-configmap -

apiVersion: v1
data:
  config: |
    artifactRepository:
      s3: {}
    persistence: {}
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: argo
    meta.helm.sh/release-namespace: admin
  creationTimestamp: "2024-07-06T10:04:01Z"
  labels:
    app.kubernetes.io/component: workflow-controller
    app.kubernetes.io/instance: argo
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argo-workflows-cm
    app.kubernetes.io/part-of: argo-workflows
    helm.sh/chart: argo-workflows-0.18.0
  name: argo-argo-workflows-workflow-controller-configmap

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

No issues in logs

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

No issues observed in logs
akshaypandhare commented 3 months ago

I am also facing same issue.

agilgur5 commented 3 months ago

We are using argo workflow with v3.4.0(0.18.0 - Chart version). Our kubernetes version is 1.25.1.

This a very old version of Argo (and an EoL version of k8s). I would try with the latest patch, i.e. 3.4.17

all versions after > 3.4

It's not clear that you tested any newer versions since you did not specify them at all.

Please check on priority

It will be auto-closed without confirmation that it is tested at least on 3.4.17 and, per the issue template, on :latest

It would further be a P3 still given how few people have not made the upgrade yet, as 3.3 has been unsupported for some time now.

We are using annotations to send logs to fluentbit and also Vault is running as side car container. When we upgrade from 3.3.2 to 3.4.0, We observed the strange issue that logs are not getting visible on elasticsearch/kibana

I also can't reproduce this and I and many users use fluentbit with Argo >3.4, as it's the de facto standard log forwarder on k8s clusters.

      image: 123456789.dkr.ecr.ap-south-1.amazonaws.com/test-migration:1.0.97886

This image also looks private, which is counter to the issue template, and would not be reproducible. Your other image my-internal-service also looks to be a replacement stub/mock as well as private. That would be doubly not reproducible.

There's also a lot going on there, much more than necessary for a minimal reproduction. Please provide a minimal reproduction with a public image that reproduces on at least 3.4.17.

Also when I stop vault as side car container then logs are coming

This would also make it lower priority given that it only happens in a certain scenario, affecting an even smaller group of users.

Priority is partly subjective, but largely based on impact to the userbase, which in your case, is quite segmented.

vasim-test commented 3 months ago

@agilgur5 Thanks for responding. We also tested with 3.5.8 version, it is having the same behaviour as we are not getting logs.We tired all versions starting from > 2.12.9 to check where it stopped sending logs. We observed argo is working fine and also we are able to get logs until v3.4.0. It means something has been added from 3.4.0 and later version onwards which is causing issues. And Emissary is the major change I observed which starts from v3.4.0. We already tested on latest version (3.5.8) but due this strange behaviour we have role it back from our production env to 3.3.2

I agree most of users are using elastic. In my scenario, I am using elastic as well as vault as side car container. You can see the annotations. Images are private but you can take any image. Only prerequisite is to have EFK(Fluentbit/kibana) and vault setup. So that you can use that annotations like I did.

Let me know if you want more info on the same.

agilgur5 commented 3 months ago

We also tested with 3.5.8 version, it is having the same behaviour as we are not getting logs.We tired all versions starting from > 2.12.9 to check where it stopped sending logs.

Ok, can you reference that in the "Version" section? Or elsewhere in your description.

It means something has been added from 3.4.0 and later version onwards which is causing issues. And Emissary is the major change I observed which starts from v3.4.0.

Yes, your observation is correct, and Emissary runs an initContainer, a sidecar, and as a parent process to your main container's command as well. So it certainly can affect something like this in theory. Given that this hasn't been reported previously though (despite 3.4 being out for almost 2 years now and Emissary even longer), I imagine it is some specific combination of your services that causes this issue.

To be clear though, Emissary was available since Argo 3.1, was the default in 3.3, and then was the only executor in 3.4+. I assume you were using another executor in your config prior to the 3.4 upgrade? The config you provided here does not list anything, so if you were using that same config in 3.3, you'd be using Emissary in 3.3 as well.

Images are private but you can take any image.

Can you please provide a minimal reproduction with a public image, as the issue template asks?

Only prerequisite is to have EFK(Fluentbit/kibana) and vault setup. So that you can use that annotations like I did.

I don't think Elastic or Kibana are relevant here, as those are not impacted by Argo and are post-processors after fluentbit. I would imagine that your logs are not getting parsed by fluentbit or aren't even making it to stdout. Can you please debug and confirm which one it is? Again that is important for a minimal reproduction.

You also did not provide the logs of the main or wait containers, although you said there were "no issues" in the logs. That would suggest that Argo passed the logs to stdout fine and the issue is actually somewhere in your other services. For instance, if your config is not handling Argo's initContainer or wait container.

vasim-test commented 3 months ago

On 3.3.2 , We are using container runtime executor is k8s api. I will try to give workflow and logs , allow me some time, I will share you evidence of I tried 3.5.8 Still following Questions in minds coming.

  1. In version 3.3.2, I have elastic and vault agent as side car. How everything working as expected like logs are coming?

  2. If I moved to v3.5.8 with the same setup of vault and elastic. Vault is doing there job not an issue but why logs stop?

  3. In the same 3.5.8, If we say issue with fluentbit, then when I intentionally stop my vault injection (not allowing vault side car to run) then how logs are coming?

  4. I have number of jobs using workflow but why only the jobs which uses workflowtemplate gets affected? mean to say jobs which uses cronworkflow only that is working fine , able to get logs and vault is also doing its jobs.

Something is happening between emissary , workflowtemplate and sidecars. Mystery!

agilgur5 commented 3 months ago

3. then when I intentionally stop my vault injection (not allowing vault side car to run) then how logs are coming?

Yea I don't know, that's odd that it affects it. If I had to guess, maybe the wait container or something didn't get properly added to the Pod, since it is also a sidecar? i.e. that the problem is not with the Emissary process itself, but with the Pod being created. It might also be timing-related since Vault uses sidecar injection?

The output of kubectl get pod <my-pod> -o yaml might be helpful in checking that

4. mean to say jobs which uses cronworkflow only that is working fine , able to get logs and vault is also doing its jobs.

That's bizarre... Are you sure they're otherwise exactly the same? Using a WorkflowTemplate has no affect on the Pod created itself (it only affects the DAG logic in the Controller, which occurs prior to Pod creation). Maybe your fluentbit config excludes certain labels that your WorkflowTemplate adds?

vasim-test commented 3 months ago

If same elastic, same vault agent and same workflow templates works in 3.3 version , it should not cause issue if we are upgrading to 3.5 but it is causing issue after upgrade only.

One more thing I observed is if we remove initcontainer, logs are visible in kibana. I think if same setof my services working fine in 3.3 and not working in latest version means in upgraded version something is causing issues to side car or init container.

Check the document here it is saying something about emissary. and attaching same snap for reference.

https://blog.argoproj.io/sunsetting-docker-executor-support-in-argo-workflows-77d352d601c image

agilgur5 commented 2 months ago

it should not cause issue if we are upgrading to 3.5 but it is causing issue after upgrade only.

I'm not sure why you're repeating yourself. Please answer all my previous questions and provide the output, logs, and reproductions asked for, which are necessary to debug. Without those essential debugging pieces, there is nothing actionable we can do. Not answering debugging questions from maintainers is not productive.

Please note that I am an unpaid volunteer maintainer and that reading and responding to all your comments takes time. Please try to make that an effective and efficient use of time.

something is causing issues to side car or init container

Yes, I said that may be the case above as well. I also said twice before that it could very well be your fluentbit config not picking up certain Pods or containers, since you appear to have logs in stdout fine. Given that other fluentbit users have not faced this before, that would be my prime suspicion.

https://blog.argoproj.io/sunsetting-docker-executor-support-in-argo-workflows-77d352d601c

I know how Emissary works, and this repo contains its source code, so I'm not sure why you're linking a 3 year old notice saying to upgrade to it.


Here's a checklist of things needed and questions unanswered:

  1. [ ] "Can you please provide a minimal reproduction with a public image, as the issue template asks?"
  2. [x] Were you "using another executor in your config prior to the 3.4 upgrade?"
    • EDIT: Yes, was using k8sapi executor per below
  3. [ ] "I would imagine that your logs are not getting parsed by fluentbit or aren't even making it to stdout. Can you please debug and confirm which one it is? Again that is important for a minimal reproduction."
  4. [ ] "The output of kubectl get pod <my-pod> -o yaml might be helpful in checking that"
  5. [ ] "Maybe your fluentbit config excludes certain labels that your WorkflowTemplate adds?"

We cannot proceed without these.

vasim-test commented 2 months ago

I appreciate your time on this. Earlier, We are using k8sapi executor. When when swith to 3.5.8 we removed it from configmap as emissary is default one.

WorkflowTemplate:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: test-migration-v1-template
spec:
  templates:
  - name: test-migration-v1-template
    container:
      command:
      - 'echo hello'
      image: nginx
      name: test-migration-v1
    initContainers:
    - name: test-config-client
      image: busybox:latest
    metadata:
      annotations:
        co.elastic.logs/enabled: "true"
        co.elastic.logs/json.add_error_key: "true"
        co.elastic.logs/json.keys_under_root: "true"
        co.elastic.logs/json.message_key: log
        fluentbit.job.name: job-test-migration-v1
        vault.hashicorp.com/agent-image: hashicorp/vault:1.6.2
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-secret-secrets.txt: kv/apps/perf/test-migration
        vault.hashicorp.com/agent-inject-template-secrets.txt: |
          {{ with secret "kv/apps/dev/test-migration" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }}
        vault.hashicorp.com/role: perf
    securityContext:
      fsGroup: 65534
    serviceAccountName: test-migration

Create CronWorkflow:

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: argo-job-test-migration-v1
spec:
  schedule: '* * * * *'
  workflowSpec:
    entrypoint: dag
    templates:
    - name: dag
      dag:
        tasks:
        - name: test-details-store
          templateRef:
            name: test-migration-v1-template
            template: test-migration-v1-template

Create service account:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: test-migration
Controller logs: ``` time="2024-07-11T08:38:19.146Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Initializing node argo-job-test-migration-v1-dg5rj.test-details-store: template: *v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false), boundaryID: argo-job-test-migration-v1-dg5rj" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.146Z" level=info msg="Pod node argo-job-test-migration-v1-dg5rj-3398093028 initialized Pending" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.146Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:19.146Z" level=debug msg="Executing node argo-job-test-migration-v1-dg5rj.test-details-store with container template: test-migration-v1-template\n" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.146Z" level=debug namespace=perf needLocation=true workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.146Z" level=debug msg="Adding init container test-config-client" time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.147Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.148Z" level=debug msg="Creating Pod: argo-job-test-migration-v1-dg5rj.test-details-store (argo-job-test-migration-v1-dg5rj-test-migration-v1-template-3398093028)" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.151Z" level=debug msg="Create events 201" time="2024-07-11T08:38:19.522Z" level=debug msg="Create pods 409" time="2024-07-11T08:38:19.523Z" level=info msg="Failed pod argo-job-test-migration-v1-dg5rj.test-details-store (argo-job-test-migration-v1-dg5rj-test-migration-v1-template-3398093028) creation: already exists" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.523Z" level=info msg="TaskSet Reconciliation" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.523Z" level=info msg=reconcileAgentPod namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.523Z" level=debug msg="Log changes patch: {\"metadata\":{\"annotations\":{\"workflows.argoproj.io/pod-name-format\":\"v2\"},\"labels\":{\"workflows.argoproj.io/completed\":\"false\",\"workflows.argoproj.io/phase\":\"Running\"}},\"spec\":{\"ttlStrategy\":{\"secondsAfterCompletion\":84600}},\"status\":{\"artifactGCStatus\":{\"notSpecified\":true},\"artifactRepositoryRef\":{\"artifactRepository\":{\"archiveLogs\":true,\"s3\":{\"accessKeySecret\":{\"key\":\"awsAccessKey\",\"name\":\"argo-artifact-secret\"},\",\"endpoint\":\"s3.amazonaws.com\",\"insecure\":false,\",\"secretKeySecret\":{\"key\":\"awsSecretAccessKey\",\"name\":\"argo-artifact-secret\"},\"useSDKCreds\":true}},\"default\":true},\"nodes\":{\"argo-job-test-migration-v1-dg5rj\":{\"children\":[\"argo-job-test-migration-v1-dg5rj-3398093028\"],\"displayName\":\"argo-job-test-migration-v1-dg5rj\",\"finishedAt\":null,\"id\":\"argo-job-test-migration-v1-dg5rj\",\"name\":\"argo-job-test-migration-v1-dg5rj\",\"phase\":\"Running\",\"startedAt\":\"2024-07-11T08:38:19Z\",\"templateName\":\"dag\",\"templateScope\":\"local/argo-job-test-migration-v1-dg5rj\",\"type\":\"DAG\"},\"argo-job-test-migration-v1-dg5rj-3398093028\":{\"boundaryID\":\"argo-job-test-migration-v1-dg5rj\",\"displayName\":\"test-details-store\",\"finishedAt\":null,\"id\":\"argo-job-test-migration-v1-dg5rj-3398093028\",\"inputs\":{\"parameters\":[{\"name\":\"script\",\"value\":\"echo hi\"}]},\"name\":\"argo-job-test-migration-v1-dg5rj.test-details-store\",\"phase\":\"Pending\",\"startedAt\":\"2024-07-11T08:38:19Z\",\"templateRef\":{\"name\":\"test-migration-v1-template\",\"template\":\"test-migration-v1-template\"},\"templateScope\":\"local/argo-job-test-migration-v1-dg5rj\",\"type\":\"Pod\"}},\"phase\":\"Running\",\"startedAt\":\"2024-07-11T08:38:19Z\",\"synchronization\":{\"mutex\":{\"holding\":[{\"holder\":\"argo-job-test-migration-v1-dg5rj\",\"mutex\":\"perf/Mutex/test-migration-v1\"}]}}}}" time="2024-07-11T08:38:19.523Z" level=info msg="Workflow to be dehydrated" Workflow Size=3194 time="2024-07-11T08:38:19.531Z" level=debug msg="Update workflows 409" time="2024-07-11T08:38:19.531Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"argo-job-test-migration-v1-dg5rj\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.531Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.534Z" level=debug msg="Get workflows 200" time="2024-07-11T08:38:19.536Z" level=info msg="Workflow to be dehydrated" Workflow Size=3544 time="2024-07-11T08:38:19.546Z" level=debug msg="Update workflows 200" time="2024-07-11T08:38:19.546Z" level=info msg="Update retry attempt 1 successful" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.546Z" level=info msg="Workflow update successful" namespace=perf phase=Running resourceVersion=438694458 workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.546Z" level=debug msg="Event(v1.ObjectReference{Kind:\"Workflow\", Namespace:\"perf\", Name:\"argo-job-test-migration-v1-dg5rj\", UID:\"bc51d987-8a29-42a8-a5e3-9aa68003a7d2\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"438694458\", FieldPath:\"\"}): type: 'Normal' reason: 'WorkflowNodeRunning' Running node argo-job-test-migration-v1-dg5rj" time="2024-07-11T08:38:19.552Z" level=debug msg="Create events 201" time="2024-07-11T08:38:19.687Z" level=info msg="Processing perf/argo-job-test-migration-v1" cronWorkflow=perf/argo-job-test-migration-v1 time="2024-07-11T08:38:19.688Z" level=debug msg="Getting the template by name: dag" time="2024-07-11T08:38:19.688Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.688Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.688Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.688Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.688Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.688Z" level=debug msg="Getting the template from ref" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.688Z" level=debug msg="Stored the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.688Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.689Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.689Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.689Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.689Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.689Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.689Z" level=info msg="CronWorkflow perf/argo-job-test-migration-v1 added" cronWorkflow=perf/argo-job-test-migration-v1 time="2024-07-11T08:38:19.723Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:19.729Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:19.731Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:20.008Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:20.518Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:38:20.519Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:38:22.022Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:24.034Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:26.046Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:27.234Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:27.238Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:27.244Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:28.060Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:29.213Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=438694458 namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.214Z" level=info msg="Task-result reconciliation" namespace=perf numObjs=0 workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.214Z" level=debug msg="Skipping artifact GC" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.214Z" level=debug msg="Lock is already acquired by perf/argo-job-test-migration-v1-dg5rj" mutex=perf/Mutex/test-migration-v1 time="2024-07-11T08:38:29.214Z" level=debug msg="perf/argo-job-test-migration-v1-dg5rj is already holding a lock" mutex=perf/Mutex/test-migration-v1 time="2024-07-11T08:38:29.214Z" level=debug msg="Getting the template from ref" time="2024-07-11T08:38:29.215Z" level=info msg="node changed" namespace=perf new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=argo-job-test-migration-v1-dg5rj-3398093028 old.message= old.phase=Pending old.progress=0/1 workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.215Z" level=debug msg="Evaluating node argo-job-test-migration-v1-dg5rj: template: *v1alpha1.WorkflowStep (dag), boundaryID: " namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.215Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:29.215Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:29.215Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:29.215Z" level=debug msg="Executing node argo-job-test-migration-v1-dg5rj of DAG is Running" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.215Z" level=debug msg="Evaluating node argo-job-test-migration-v1-dg5rj.test-details-store: template: *v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false), boundaryID: argo-job-test-migration-v1-dg5rj" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.215Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:29.215Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.215Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.216Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.216Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.216Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:29.216Z" level=debug msg="Executing node argo-job-test-migration-v1-dg5rj.test-details-store of Pod is Pending" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.216Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:29.216Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:29.216Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=perf,name=argo-job-test-migration-v1-dg5rj)" tmpl="*v1alpha1.NodeStatus (dag)" time="2024-07-11T08:38:29.216Z" level=debug msg="Executing node argo-job-test-migration-v1-dg5rj.test-details-store with container template: test-migration-v1-template\n" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.216Z" level=debug msg="Skipped pod argo-job-test-migration-v1-dg5rj.test-details-store (argo-job-test-migration-v1-dg5rj-3398093028) creation: already exists" namespace=perf podPhase=Pending workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.216Z" level=info msg="TaskSet Reconciliation" namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.216Z" level=info msg=reconcileAgentPod namespace=perf workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:29.217Z" level=debug msg="Log changes patch: {\"status\":{\"conditions\":[{\"status\":\"False\",\"type\":\"PodRunning\"}],\"nodes\":{\"argo-job-test-migration-v1-dg5rj-3398093028\":{\"hostNodeName\":\"ip-10-10-1.10\",\"message\":\"PodInitializing\"}}}}" time="2024-07-11T08:38:29.217Z" level=info msg="Workflow to be dehydrated" Workflow Size=3664 time="2024-07-11T08:38:29.232Z" level=debug msg="Update workflows 200" time="2024-07-11T08:38:29.232Z" level=info msg="Workflow update successful" namespace=perf phase=Running resourceVersion=438694551 workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:30.123Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:30.519Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:38:30.519Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:38:32.136Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:34.149Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:34.743Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:34.747Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:34.753Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:36.163Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:38.176Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:40.188Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:40.511Z" level=debug msg="Check the workflow existence" time="2024-07-11T08:38:40.520Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:38:40.520Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:38:42.208Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:42.273Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:42.276Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:42.276Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:44.218Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:46.231Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:48.248Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:49.785Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:49.787Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:49.788Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:50.262Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:50.521Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:38:50.521Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:38:52.274Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:54.286Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:56.300Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:38:57.298Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:38:57.302Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:38:57.302Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:38:58.312Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:00.358Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:00.522Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:39:00.522Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:39:02.373Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:04.387Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:04.810Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:39:04.813Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:39:04.813Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:39:06.405Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:08.419Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:10.434Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:10.523Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:39:10.523Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:39:12.320Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:39:12.320Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:39:12.325Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:39:12.448Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:14.459Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:16.471Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:18.488Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:19.833Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:39:19.838Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:39:19.844Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:39:20.500Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:20.524Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:39:20.524Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:39:22.515Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:24.528Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:26.539Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:27.356Z" level=debug msg="received config map istio-system/istio-gateway-status-leader update" time="2024-07-11T08:39:27.357Z" level=debug msg="received config map istio-system/istio-namespace-controller-election update" time="2024-07-11T08:39:27.357Z" level=debug msg="received config map istio-system/istio-leader update" time="2024-07-11T08:39:28.552Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:30.525Z" level=debug msg="Syncing all CronWorkflows" time="2024-07-11T08:39:30.525Z" level=debug msg="Enforcing history limit for 'argo-job-test-migration-v1'" namespace=perf workflow=argo-job-test-migration-v1 time="2024-07-11T08:39:30.566Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" time="2024-07-11T08:39:32.578Z" level=debug msg="received config map kube-system/aws-load-balancer-controller-leader update" ```
Argo Server logs: ``` time="2024-07-11T08:36:44.302Z" level=debug msg="CLI version" version="{v3.5.8 2024-06-18T03:41:48Z 3bb637c0261f8c08d4346175bb8b1024719a1f11 v3.5.8 clean go1.21.10 gc linux/amd64}" time="2024-07-11T08:36:44.302Z" level=info msg="not enabling pprof debug endpoints" time="2024-07-11T08:36:44.303Z" level=info authModes="[server]" baseHRef=/ managedNamespace= namespace=admin secure=false ssoNamespace=admin time="2024-07-11T08:36:44.303Z" level=warning msg="You are running in insecure mode. Learn how to enable transport layer security: https://argo-workflows.readthedocs.io/en/release-3.5/tls/" time="2024-07-11T08:36:44.303Z" level=warning msg="You are running without client authentication. Learn how to enable client authentication: https://argo-workflows.readthedocs.io/en/release-3.5/argo-server-auth-mode/" time="2024-07-11T08:36:44.303Z" level=info msg="SSO disabled" time="2024-07-11T08:36:44.315Z" level=info msg="Starting Argo Server" instanceID= version=v3.5.8 time="2024-07-11T08:36:44.315Z" level=info msg="Creating event controller" asyncDispatch=false operationQueueSize=16 workerCount=4 time="2024-07-11T08:36:44.321Z" level=info msg="GRPC Server Max Message Size, MaxGRPCMessageSize, is set" GRPC_MESSAGE_SIZE=104857600 time="2024-07-11T08:36:44.321Z" level=info msg="Argo Server started successfully on http://localhost:2746" url="http://localhost:2746" time="2024-07-11T08:37:04.061Z" level=info duration="129.005µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:37:10.759Z" level=debug msg="Piping events to channel" time="2024-07-11T08:37:24.061Z" level=info duration="103.608µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:37:44.061Z" level=info duration="99.201µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:38:04.061Z" level=info duration="65.947µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:38:13.656Z" level=debug msg="Result channel done" time="2024-07-11T08:38:13.656Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:37:10Z" grpc.time_ms=62900.21 span.kind=server system=grpc 2024/07/11 08:38:13 http: superfluous response.WriteHeader call from github.com/argoproj/argo-workflows/v3/server/apiserver/accesslog.(*resultCapturingWriter).WriteHeader (result_capturing_writer.go:23) time="2024-07-11T08:38:13.656Z" level=info duration=1m2.900850898s method=GET path=/api/v1/workflow-events/perf size=19 status=408 time="2024-07-11T08:38:13.660Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=ListCronWorkflows grpc.service=cronworkflow.CronWorkflowService grpc.start_time="2024-07-11T08:38:13Z" grpc.time_ms=6.969 span.kind=server system=grpc time="2024-07-11T08:38:13.664Z" level=info duration=10.67679ms method=GET path=/api/v1/cron-workflows/perf size=1720 status=0 time="2024-07-11T08:38:16.365Z" level=debug msg=CanI name= namespace=perf resource=workflows verb=list time="2024-07-11T08:38:16.367Z" level=debug msg=CanI name= namespace=perf resource=workflows status="{true false RBAC: allowed by ClusterRoleBinding \"argo-server-binding\" of ClusterRole \"argo-server-cluster-role\" to ServiceAccount \"argo-server/admin\" }" verb=list time="2024-07-11T08:38:16.367Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=ListWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:16Z" grpc.time_ms=2.559 span.kind=server system=grpc time="2024-07-11T08:38:16.368Z" level=info duration=3.159236ms method=GET path=/api/v1/workflows/perf size=57 status=0 time="2024-07-11T08:38:16.369Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetCronWorkflow grpc.service=cronworkflow.CronWorkflowService grpc.start_time="2024-07-11T08:38:16Z" grpc.time_ms=5.046 span.kind=server system=grpc time="2024-07-11T08:38:16.369Z" level=info duration=5.626759ms method=GET path=/api/v1/cron-workflows/perf/argo-job-test-migration-v1 size=1665 status=0 time="2024-07-11T08:38:19.097Z" level=debug msg="Getting the template by name: dag" time="2024-07-11T08:38:19.097Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.097Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.097Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.098Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.098Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.098Z" level=debug msg="Getting the template from ref" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.105Z" level=debug msg="Stored the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.108Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.109Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.DAGTask (test-migration-v1-template/test-migration-v1-template#false)" time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="unresolved is allowed " error=unresolved time="2024-07-11T08:38:19.113Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.113Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.113Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.113Z" level=debug msg="Getting the template by name: dag" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (dag)" time="2024-07-11T08:38:19.124Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=SubmitWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:19Z" grpc.time_ms=29.911 span.kind=server system=grpc time="2024-07-11T08:38:19.124Z" level=info duration=30.737748ms method=POST path=/api/v1/workflows/perf/submit size=2969 status=0 time="2024-07-11T08:38:19.179Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:19Z" grpc.time_ms=7.935 span.kind=server system=grpc time="2024-07-11T08:38:19.179Z" level=info duration=8.452007ms method=GET path=/api/v1/workflows/perf/argo-job-test-migration-v1-dg5rj size=2969 status=0 time="2024-07-11T08:38:19.226Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:19Z" grpc.time_ms=11.466 span.kind=server system=grpc time="2024-07-11T08:38:19.226Z" level=info duration=11.974651ms method=GET path=/api/v1/workflows/perf/argo-job-test-migration-v1-dg5rj size=2969 status=0 time="2024-07-11T08:38:19.236Z" level=debug msg="Piping events to channel" time="2024-07-11T08:38:19.237Z" level=debug msg="Received workflow event" time="2024-07-11T08:38:19.237Z" level=debug msg="Sending workflow event" phase=Running type=ADDED workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:19.272Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:19Z" grpc.time_ms=12.397 span.kind=server system=grpc time="2024-07-11T08:38:19.273Z" level=info duration=13.065806ms method=GET path=/api/v1/workflows/perf/argo-job-test-migration-v1-dg5rj size=5072 status=0 time="2024-07-11T08:38:19.545Z" level=debug msg="Received workflow event" time="2024-07-11T08:38:19.545Z" level=debug msg="Sending workflow event" phase=Running type=MODIFIED workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:24.061Z" level=info duration="73.656µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:38:29.232Z" level=debug msg="Received workflow event" time="2024-07-11T08:38:29.233Z" level=debug msg="Sending workflow event" phase=Running type=MODIFIED workflow=argo-job-test-migration-v1-dg5rj time="2024-07-11T08:38:44.061Z" level=info duration="106.593µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:39:04.061Z" level=info duration="107.957µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:39:24.061Z" level=info duration="109.535µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:39:44.061Z" level=info duration="82.925µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:40:02.693Z" level=info duration=1m43.468569554s method=GET path=/api/v1/workflow-events/perf size=15971 status=0 time="2024-07-11T08:40:02.693Z" level=debug msg="Result channel done" time="2024-07-11T08:40:02.693Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2024-07-11T08:38:19Z" grpc.time_ms=103468.36 span.kind=server system=grpc time="2024-07-11T08:40:02.695Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=ListWorkflowTemplates grpc.service=workflowtemplate.WorkflowTemplateService grpc.start_time="2024-07-11T08:40:02Z" grpc.time_ms=7.043 span.kind=server system=grpc time="2024-07-11T08:40:02.696Z" level=info duration=7.903142ms method=GET path=/api/v1/workflow-templates/perf size=2326 status=0 time="2024-07-11T08:40:04.061Z" level=info duration="102.507µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:40:04.590Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflowTemplate grpc.service=workflowtemplate.WorkflowTemplateService grpc.start_time="2024-07-11T08:40:04Z" grpc.time_ms=4.445 span.kind=server system=grpc time="2024-07-11T08:40:04.590Z" level=info duration=5.129671ms method=GET path=/api/v1/workflow-templates/perf/test-migration-v1-template size=2271 status=0 time="2024-07-11T08:40:24.061Z" level=info duration="103.768µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:40:44.061Z" level=info duration="86.675µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:41:04.061Z" level=info duration="95.86µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:41:24.060Z" level=info duration="81.517µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:41:44.061Z" level=info duration="89.283µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:41:44.302Z" level=info msg="Alloc=10816 TotalAlloc=18987 Sys=34149 NumGC=6 Goroutines=100" time="2024-07-11T08:42:04.061Z" level=info duration="100.999µs" method=GET path=index.html size=487 status=0 time="2024-07-11T08:42:24.061Z" level=info duration="85.946µs" method=GET path=index.html size=487 status=0 ```
argo job pod - manifest (from kubectl describe pod) ``` Name: argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379 Namespace: perf Priority: 0 Node: ip-10-10-9-4-compute.internal Start Time: Thu, 11 Jul 2024 09:30:13 +0000 Labels: workflows.argoproj.io/completed=false workflows.argoproj.io/workflow=argo-job-test-migration-v1-zgpnz Annotations: co.elastic.logs/enabled: true co.elastic.logs/json.add_error_key: true co.elastic.logs/json.keys_under_root: true co.elastic.logs/json.message_key: log fluentbit.job.name: job-test-migration-v1 kubectl.kubernetes.io/default-container: main vault.hashicorp.com/agent-image: hashicorp/vault:1.6.2 vault.hashicorp.com/agent-inject: true vault.hashicorp.com/agent-inject-secret-secrets.txt: kv/apps/perf/test-migration vault.hashicorp.com/agent-inject-status: injected vault.hashicorp.com/agent-inject-template-secrets.txt: {{ with secret "kv/apps/dev/test-migration" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }} vault.hashicorp.com/role: perf workflows.argoproj.io/node-id: argo-job-test-migration-v1-zgpnz-2007578379 workflows.argoproj.io/node-name: argo-job-test-migration-v1-zgpnz.test-details-store Controlled By: Workflow/argo-job-test-migration-v1-zgpnz Init Containers: init: Container ID: containerd://35b1df68e7c4c62206afa067ab86ca24641f8f07d1c639652213fcf7d9c16ae0 Image: quay.io/argoproj/argoexec:v3.5.8 Image ID: quay.io/argoproj/argoexec@sha256:a9e8906183c3c738573b8e5d8a2a67eee428dd7a52eb8c7b77c9916ba8f4685b Port: Host Port: Command: argoexec init --loglevel debug --log-format text State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 11 Jul 2024 09:30:21 +0000 Finished: Thu, 11 Jul 2024 09:30:21 +0000 Ready: True Restart Count: 0 Environment: ARGO_POD_NAME: argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379 (v1:metadata.name) ARGO_POD_UID: (v1:metadata.uid) GODEBUG: x509ignoreCN=0 ARGO_WORKFLOW_NAME: argo-job-test-migration-v1-zgpnz ARGO_WORKFLOW_UID: df1aaf50-a11b-45db-9c8e-20b3fd69ca9e ARGO_CONTAINER_NAME: init ARGO_TEMPLATE: {"name":"test-migration-v1-template","inputs":{"parameters":[{"name":"script","value":"echo hi"}]},"outputs":{},"metadata":{"annotations":{"co.elastic.logs/enabled":"true","co.elastic.logs/json.add_error_key":"true","co.elastic.logs/json.keys_under_root":"true","co.elastic.logs/json.message_key":"log","fluentbit.job.name":"job-test-migration-v1","vault.hashicorp.com/agent-image":"hashicorp/vault:1.6.2","vault.hashicorp.com/agent-inject":"true","vault.hashicorp.com/agent-inject-secret-secrets.txt":"kv/apps/perf/test-migration","vault.hashicorp.com/agent-inject-template-secrets.txt":"{{ with secret \"kv/apps/dev/test-migration\" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }}\n","vault.hashicorp.com/role":"perf"}},"container":{"name":"test-migration-v1","image":"nginx","command":["echo hi"],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]},"volumes":[{"name":"shared-data","emptyDir":{}}],"initContainers":[{"name":"test-config-client","image":"busybox:latest","env":[{"name":"APP_NAME","value":"test-migration"},{"name":"VERSION","value":"master"}],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]}],:false,"accessKeySecret":{"name":"argo-artifact-secret","key":"awsAccessKey"},"secretKeySecret":{"name":"argo-artifact-secret","key":"awsSecretAccessKey"},"useSDKCreds":true,"key":"argo-job-test-migration-v1-zgpnz/argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379"}},"serviceAccountName":"test-migration","securityContext":{"fsGroup":65534}} ARGO_NODE_ID: argo-job-test-migration-v1-zgpnz-2007578379 ARGO_INCLUDE_SCRIPT_OUTPUT: false ARGO_DEADLINE: 2025-11-23T09:30:13Z ARGO_PROGRESS_FILE: /var/run/argo/progress ARGO_PROGRESS_PATCH_TICK_DURATION: 1m0s ARGO_PROGRESS_FILE_TICK_DURATION: 3s Mounts: /argo/secret/argo-artifact-secret from argo-artifact-secret (ro) /var/run/argo from var-run-argo (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) test-config-client: Container ID: containerd://119245520cdd21aa00c6b87d73bdd5210677b6ae71f6336bab1c42f003613ea7 Image: busybox:latest Image ID: docker.io/library/busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7 Port: Host Port: State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 11 Jul 2024 09:30:26 +0000 Finished: Thu, 11 Jul 2024 09:30:26 +0000 Ready: True Restart Count: 0 Environment: APP_NAME: test-migration VERSION: master ARGO_CONTAINER_NAME: test-config-client ARGO_TEMPLATE: {"name":"test-migration-v1-template","inputs":{"parameters":[{"name":"script","value":"echo hi"}]},"outputs":{},"metadata":{"annotations":{"co.elastic.logs/enabled":"true","co.elastic.logs/json.add_error_key":"true","co.elastic.logs/json.keys_under_root":"true","co.elastic.logs/json.message_key":"log","fluentbit.job.name":"job-test-migration-v1","vault.hashicorp.com/agent-image":"hashicorp/vault:1.6.2","vault.hashicorp.com/agent-inject":"true","vault.hashicorp.com/agent-inject-secret-secrets.txt":"kv/apps/perf/test-migration","vault.hashicorp.com/agent-inject-template-secrets.txt":"{{ with secret \"kv/apps/dev/test-migration\" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }}\n","vault.hashicorp.com/role":"perf"}},"container":{"name":"test-migration-v1","image":"nginx","command":["echo hi"],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]},"volumes":[{"name":"shared-data","emptyDir":{}}],"initContainers":[{"name":"test-config-client","image":"busybox:latest","env":[{"name":"APP_NAME","value":"test-migration"},{"name":"VERSION","value":"master"}],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]}],"insecure":false,"accessKeySecret":{"name":"argo-artifact-secret","key":"awsAccessKey"},"secretKeySecret":{"name":"argo-artifact-secret","key":"awsSecretAccessKey"},"useSDKCreds":true,"key":"argo-job-test-migration-v1-zgpnz/argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379"}},"serviceAccountName":"test-migration","securityContext":{"fsGroup":65534}} ARGO_NODE_ID: argo-job-test-migration-v1-zgpnz-2007578379 ARGO_INCLUDE_SCRIPT_OUTPUT: false ARGO_DEADLINE: 2025-11-23T09:30:13Z ARGO_PROGRESS_FILE: /var/run/argo/progress ARGO_PROGRESS_PATCH_TICK_DURATION: 1m0s ARGO_PROGRESS_FILE_TICK_DURATION: 3s Mounts: /data from shared-data (rw) /var/run/argo from var-run-argo (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) vault-agent-init: Container ID: containerd://42b7df5472e7c869cf0d572491163156499739479617d8262e1843f63d1db6a3 Image: hashicorp/vault:1.6.2 Image ID: docker.io/hashicorp/vault@sha256:0d9048f209d2b754a5a92512e3d96fd010cc2888f162e68df515f26e93347156 Port: Host Port: Command: /bin/sh -ec Args: echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json State: Running Started: Thu, 11 Jul 2024 09:30:27 +0000 Ready: False Restart Count: 0 Limits: cpu: 500m memory: 128Mi Requests: cpu: 250m memory: 64Mi Environment: NAMESPACE: perf (v1:metadata.namespace) HOST_IP: (v1:status.hostIP) POD_IP: (v1:status.podIP) VAULT_LOG_LEVEL: info VAULT_LOG_FORMAT: standard VAULT_CONFIG: eyJhdXRvX2F1dGgiOnsibWV0aG9kIjp7InR5cGUiOiJrdWJlcm5ldGVzIiwibW91bnRfcGF0aCI6ImF1dGgva3ViZXJuZXRlcyIsImNvbmZpZyI6eyJyb2xlIjoicGVyZiIsInRva2VuX3BhdGgiOiIvdmFyL3J1bi9zZWNyZXRzL2t1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvdG9rZW4ifX0sInNpbmsiOlt7InR5cGUiOiJmaWxlIiwiY29uZmlnIjp7InBhdGgiOiIvaG9tZS92YXVsdC8udmF1bHQtdG9rZW4ifX1dfSwiZXhpdF9hZnRlcl9hdXRoIjp0cnVlLCJwaWRfZmlsZSI6Ii9ob21lL3ZhdWx0Ly5waWQiLCJ2YXVsdCI6eyJhZGRyZXNzIjoiaHR0cDovL3ZhdWx0LmFkbWluLnN2Yzo4MjAwIn0sInRlbXBsYXRlIjpbeyJkZXN0aW5hdGlvbiI6Ii92YXVsdC9zZWNyZXRzL3NlY3JldHMudHh0IiwiY29udGVudHMiOiJ7eyB3aXRoIHNlY3JldCBcImt2L2FwcHMvZGV2L3Rlc3QtbWlncmF0aW9uXCIgfX17eyByYW5nZSAkaywgJHYgOj0gLkRhdGEuZGF0YSB9fSB7eyAkayB9fT17eyAkdiB9fXt7IGVuZCB9fXt7IGVuZCB9fVxuIiwibGVmdF9kZWxpbWl0ZXIiOiJ7eyIsInJpZ2h0X2RlbGltaXRlciI6In19In1dLCJ0ZW1wbGF0ZV9jb25maWciOnsiZXhpdF9vbl9yZXRyeV9mYWlsdXJlIjp0cnVlfX0= Mounts: /home/vault from home-init (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) Containers: wait: Container ID: Image: quay.io/argoproj/argoexec:v3.5.8 Image ID: Port: Host Port: Command: argoexec wait --loglevel debug --log-format text State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: ARGO_POD_NAME: argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379 (v1:metadata.name) ARGO_POD_UID: (v1:metadata.uid) GODEBUG: x509ignoreCN=0 ARGO_WORKFLOW_NAME: argo-job-test-migration-v1-zgpnz ARGO_WORKFLOW_UID: df1aaf50-a11b-45db-9c8e-20b3fd69ca9e ARGO_CONTAINER_NAME: wait ARGO_TEMPLATE: {"name":"test-migration-v1-template","inputs":{"parameters":[{"name":"script","value":"echo hi"}]},"outputs":{},"metadata":{"annotations":{"co.elastic.logs/enabled":"true","co.elastic.logs/json.add_error_key":"true","co.elastic.logs/json.keys_under_root":"true","co.elastic.logs/json.message_key":"log","fluentbit.job.name":"job-test-migration-v1","vault.hashicorp.com/agent-image":"hashicorp/vault:1.6.2","vault.hashicorp.com/agent-inject":"true","vault.hashicorp.com/agent-inject-secret-secrets.txt":"kv/apps/perf/test-migration","vault.hashicorp.com/agent-inject-template-secrets.txt":"{{ with secret \"kv/apps/dev/test-migration\" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }}\n","vault.hashicorp.com/role":"perf"}},"container":{"name":"test-migration-v1","image":"nginx","command":["echo hi"],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]},"volumes":[{"name":"shared-data","emptyDir":{}}],"initContainers":[{"name":"test-config-client","image":"busybox:latest","env":[{"name":"APP_NAME","value":"test-migration"},{"name":"VERSION","value":"master"}],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]}],:false,"accessKeySecret":{"name":"argo-artifact-secret","key":"awsAccessKey"},"secretKeySecret":{"name":"argo-artifact-secret","key":"awsSecretAccessKey"},,"useSDKCreds":true,"key":"argo-job-test-migration-v1-zgpnz/argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379"}},"serviceAccountName":"test-migration","securityContext":{"fsGroup":65534}} ARGO_NODE_ID: argo-job-test-migration-v1-zgpnz-2007578379 ARGO_INCLUDE_SCRIPT_OUTPUT: false ARGO_DEADLINE: 2025-11-23T09:30:13Z ARGO_PROGRESS_FILE: /var/run/argo/progress ARGO_PROGRESS_PATCH_TICK_DURATION: 1m0s ARGO_PROGRESS_FILE_TICK_DURATION: 3s Mounts: /argo/secret/argo-artifact-secret from argo-artifact-secret (ro) /mainctrfs/data from shared-data (rw) /tmp from tmp-dir-argo (rw,path="0") /var/run/argo from var-run-argo (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) main: Container ID: Image: nginx Image ID: Port: Host Port: Command: /var/run/argo/argoexec emissary --loglevel debug --log-format text -- echo hi State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: ARGO_CONTAINER_NAME: main ARGO_TEMPLATE: {"name":"test-migration-v1-template","inputs":{"parameters":[{"name":"script","value":"echo hi"}]},"outputs":{},"metadata":{"annotations":{"co.elastic.logs/enabled":"true","co.elastic.logs/json.add_error_key":"true","co.elastic.logs/json.keys_under_root":"true","co.elastic.logs/json.message_key":"log","fluentbit.job.name":"job-test-migration-v1","vault.hashicorp.com/agent-image":"hashicorp/vault:1.6.2","vault.hashicorp.com/agent-inject":"true","vault.hashicorp.com/agent-inject-secret-secrets.txt":"kv/apps/perf/test-migration","vault.hashicorp.com/agent-inject-template-secrets.txt":"{{ with secret \"kv/apps/dev/test-migration\" }}{{ range $k, $v := .Data.data }} {{ $k }}={{ $v }}{{ end }}{{ end }}\n","vault.hashicorp.com/role":"perf"}},"container":{"name":"test-migration-v1","image":"nginx","command":["echo hi"],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]},"volumes":[{"name":"shared-data","emptyDir":{}}],"initContainers":[{"name":"test-config-client","image":"busybox:latest","env":[{"name":"APP_NAME","value":"test-migration"},{"name":"VERSION","value":"master"}],"resources":{},"volumeMounts":[{"name":"shared-data","mountPath":"/data"}]}],:false,"accessKeySecret":{"name":"argo-artifact-secret","key":"awsAccessKey"},"secretKeySecret":{"name":"argo-artifact-secret","key":"awsSecretAccessKey"},,"useSDKCreds":true,"key":"argo-job-test-migration-v1-zgpnz/argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379"}},"serviceAccountName":"test-migration","securityContext":{"fsGroup":65534}} ARGO_NODE_ID: argo-job-test-migration-v1-zgpnz-2007578379 ARGO_INCLUDE_SCRIPT_OUTPUT: false ARGO_DEADLINE: 2025-11-23T09:30:13Z ARGO_PROGRESS_FILE: /var/run/argo/progress ARGO_PROGRESS_PATCH_TICK_DURATION: 1m0s ARGO_PROGRESS_FILE_TICK_DURATION: 3s Mounts: /data from shared-data (rw) /var/run/argo from var-run-argo (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) vault-agent: Container ID: Image: hashicorp/vault:1.6.2 Image ID: Port: Host Port: Command: /bin/sh -ec Args: echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Limits: cpu: 500m memory: 128Mi Requests: cpu: 250m memory: 64Mi Environment: NAMESPACE: perf (v1:metadata.namespace) HOST_IP: (v1:status.hostIP) POD_IP: (v1:status.podIP) VAULT_LOG_LEVEL: info VAULT_LOG_FORMAT: standard AWS_STS_REGIONAL_ENDPOINTS: regional Mounts: /home/vault from home-sidecar (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xgxr (ro) /vault/secrets from vault-secrets (rw) Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True Volumes: aws-iam-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 86400 var-run-argo: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: tmp-dir-argo: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: shared-data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: argo-artifact-secret: Type: Secret (a volume populated by a Secret) SecretName: argo-artifact-secret Optional: false kube-api-access-6xgxr: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true home-init: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: home-sidecar: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: vault-secrets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m3s default-scheduler Successfully assigned perf/argo-job-test-migration-v1-zgpnz-test-migration-v1-template-2007578379 to ip-10.10.1.9 Normal Pulling 2m2s kubelet Pulling image "quay.io/argoproj/argoexec:v3.5.8" Normal Pulled 115s kubelet Successfully pulled image "quay.io/argoproj/argoexec:v3.5.8" in 6.787386635s (6.787401053s including waiting) Normal Created 115s kubelet Created container init Normal Started 115s kubelet Started container init Normal Pulling 114s kubelet Pulling image "busybox:latest" Normal Pulled 110s kubelet Successfully pulled image "busybox:latest" in 4.327784116s (4.327797931s including waiting) Normal Created 110s kubelet Created container test-config-client Normal Started 110s kubelet Started container test-config-client Normal Pulled 109s kubelet Container image "hashicorp/vault:1.6.2" already present on machine Normal Created 109s kubelet Created container vault-agent-init Normal Started 109s kubelet Started container vault-agent-init ```

Note- It doesn't matter if pod running or not , I am expecting logs to come even if pods failed. Also When stop the init container, I starts to get logs for the pod even when it is failed. Regenerating issue simple, You need some init container, side car container and rest of template can be created around it. I already shared sample templates from which I regenerate shared with you.

agilgur5 commented 2 months ago

That is better, but you only completed a single one of the checkboxes I listed out for you above.

This reproduction is better, but still not minimal:

  1. you have generated fields in your manifests such as creationTimestamp, generation, resourceVersion, uid, labels, annotation, and status. Some of these will make it impossible to apply directly.
  2. you have many other unused fields, including fields with values of {}, unused volumes + volumeMounts (they do nothing in your manifest), unused env
  3. you have other fields that are unnecessary to the reproduction, such as synchronization, concurrencyPolicy, activeDeadlineSeconds, ttlStrategy, and an input that can just be hard-coded
  4. you provided an SA but none of its RBAC. As far as I can tell, you would only need RBAC for Argo (workflowtaskresults) and Vault (depending on your set-up)
  5. I also don't have any of your Vault config -- ideally you can make a reproduction where you don't need any secrets and inject a Vault agent for the sole purpose of reproduction. Alternatively you can make simple, fake secrets, but the former would be simpler and therefore preferable.

Minimal means as small as possible. The minimum necessary to reproduce, with nothing extra.

I made several edits to your comment, please read them and note the differences. Please note how they substantially shrink the reproduction. You also had some poor formatting, please see the multiple significant edits I made to your comment. Those edits take a lot of time and are things you could have done yourself initially. In the future, when asking for assistance or reporting an issue to any OSS repo, please make those yourself.

You also did not answer my two fluentbit-related questions, which I suspect are critical to the root cause of this issue. If your output makes it to stdout, then it is likely not Argo-related, but specific your fluentbit config.

argo job pod - manifest

This isn't a manifest, this is the output of kubectl describe pod, whereas I had asked for kubectl get pod -o yaml. This may suffice though. The init, main, and wait containers are all there, as is your additional initContainer and your vault-agent-init and vault-agent containers.

Although this seems to be a running Pod, not a completed one.

Also When stop the init container, I starts to get logs for the pod even when it is failed.

Does the init container not run to completion on its own? Is it always stuck? That would be a very important, critical detail that you did not previously mention. From the events, it looks like it does not finish, since it started 109s ago.

That would be the issue if so (not log propagation); there are no logs because the Pod is stuck and all containers haven't run. The question would then be why is the vault-agent-init container stuck. fluentbit and logs as a whole would be entirely unrelated. That container is not running Emissary as a parent process though, so it's purely running Vault code.

vasim-test commented 2 months ago

I made several edits to your comment, please read them and note the differences. Please note how they substantially shrink the reproduction. You also had some poor formatting, please see the multiple significant edits I made to your comment. Those edits take a lot of time and are things you could have done yourself initially. In the future, when asking for assistance or reporting an issue to any OSS repo, please make those yourself.

Thank You. It looks good. I noted this.

You also did not answer my two fluentbit-related questions, which I suspect are critical to the root cause of this issue. If your output makes it to stdout, then it is likely not Argo-related, but specific your fluentbit config.

In my day time I will share you my config for fluent bit.

Although this seems to be a running Pod, not a completed one.

Yes it is not completed one but I am expecting that failure logs also. I can able to get the failure logs after remove initcontainer section.

Also When stop the init container, I starts to get logs for the pod even when it is failed.

Does the init container not run to completion on its own? Is it always stuck? That would be a very important, critical detail that you did not previously mention. From the events, it looks like it does not finish, since it started 109s ago.

I mean to say, if I removed init container section from template, I am getting logs. Even if my initcontainer completed, unable to get logs. I regenerated issue by doing this with public images.

That would be the issue if so (not log propagation); there are no logs because the Pod is stuck and all containers haven't run. The question would then be why is the vault-agent-init container stuck. fluentbit and logs as a whole would be entirely unrelated. That container is not running Emissary as a parent process though, so it's purely running Vault code.

Vault agent is stuck because it is not having that secret. Just for regenerate I have not created secret. I was able to regenerate without creating secret in vault also. I am aware my pod will not complete because of this. But I am seeing that failure logs also in kibana if remove initcontainer section.

vasim-test commented 2 months ago

Thanks Anton. Its strange! When I removed this annotations co.elastic.logs/json.keys_under_root: "true" I am getting logs.🙂 But I am confused here, how the same set of annotations works in 3.3 version. Any logging related changes do I need to make in this version, please let me know if you have any idea. Because not sure if removing this annotations can cause any other issue. Fluent Bit Conf -

apiVersion: v1
data:
 custom_parsers.conf: |
  [PARSER]
    Name docker_no_time
    Format json
    Time_Keep Off
    Time_Key time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
  [PARSER]
    Name cri
    Format regex
    Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
    Time_Key time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
 fluent-bit.conf: |
  [SERVICE]
    Flush 1
    Daemon Off
    Parsers_File parsers.conf
    Parsers_File custom_parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port 2020
     # Log_Level debug
    storage.metrics on

  [INPUT]
    Name tail
    Path /var/log/containers/*.log
    Exclude_Path /var/log/containers/argo-job-*.log,/var/log/containers/argo-workflow-*.log
    Refresh_Interval 10
    multiline.parser docker, cri
    Tag jio_service.*
    Mem_Buf_Limit 50MB
    Skip_Long_Lines On
  [INPUT]
    Name tail
    Path /var/log/containers/argo-job-*.log
    Exclude_Path /var/log/containers/argo-workflow-*.log
    Refresh_Interval 10
    multiline.parser docker, cri
    Tag argo_job.*
    Mem_Buf_Limit 50MB
    Skip_Long_Lines On
[INPUT]
    Name tail
    Path /var/log/containers/ingress-*.log
    Exclude_Path /var/log/containers/argo-workflow-*.log
    Refresh_Interval 10
    multiline.parser docker, cri
    Tag ingress.*
    Mem_Buf_Limit 100MB
    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 5MB
    Skip_Long_Lines Off

  [FILTER]
    Name kubernetes
    Match jiao_service.*
    Labels On
    Annotations Off
    Merge_Log On
    Keep_Log On
    K8S-Logging.Parser On
    K8S-Logging.Exclude On
    Kube_Tag_Prefix neo_service.var.log.containers.
  [FILTER]
    Name kubernetes
    Match argo_job.*
    Labels On
    Annotations On
    Merge_Log On
    Keep_Log On
    K8S-Logging.Parser On
    K8S-Logging.Exclude Off
    Kube_Tag_Prefix argo_job.var.log.containers.
  [FILTER]
    Name kubernetes
    Match ingress.*
    Labels On
    Annotations On
    Merge_Log On
    Keep_Log On
    K8S-Logging.Parser On
    K8S-Logging.Exclude On
    Kube_Tag_Prefix ingress.var.log.containers.
[OUTPUT]
    Name es
    Match jiao_service.*
    Host multi-client
    Port 9200
    HTTP_User elastic
    HTTP_Passwd test@123
    Logstash_Format On
    logstash_prefix_key $kubernetes['namespace_name'].$kubernetes['container_name']
    Retry_Limit 5
    tls off
    tls.verify off
    Suppress_Type_Name On
    Replace_Dots On
    Trace_Error on
    Generate_ID On
  [OUTPUT]
    Name es
    Match argo_job.*
    Host multi-client
    Port 9200
    HTTP_User elastic
    HTTP_Passwd test@123
    Logstash_Format On
    logstash_prefix_key $kubernetes['namespace_name'].$kubernetes['annotations']['fluentbit.job.name']
    Retry_Limit 5
    tls off
    tls.verify off
    Replace_Dots On
    Suppress_Type_Name On
    Generate_ID On
  [OUTPUT]
    Name es
    Alias Ingress
    Match ingress.*
    Host multi-client
    Port 9200
    HTTP_User elastic
    HTTP_Passwd test@123
    Logstash_Format On
    logstash_prefix_key $kubernetes['namespace_name'].$kubernetes['labels']['app.kubernetes.io/name']
    Retry_Limit 5
    tls off
    tls.verify off
    net.keepalive Off
    Suppress_Type_Name On
    Replace_Dots On
    Generate_ID On
Joibel commented 2 months ago

Thanks Anton. Its strange! When I removed this annotations co.elastic.logs/json.keys_under_root: "true" I am getting logs.🙂

This is therefore a problem in fluentbit, and not in argo workflows.

Any logging related changes do I need to make in this version, please let me know if you have any idea. Because not sure if removing this annotations can cause any other issue.

I suggest you research or ask around in the elastic/fluentbit community.

vasim-test commented 2 months ago

Still one question comes in my mind, if possible for you can answer, if same annotations works in 3.3 version of argo, what is that in 3.4+ version of argo workflow stopping it to send log and interestingly without removing mentioned elastic annotations also it work if I stop/remove sidecar or init container in the pod. Thanks!

Joibel commented 2 months ago

I don't know. Fluentbit is the one not sending logs, not workflows.

vasim-test commented 2 months ago

Upon further investigation, I discovered that there has been a change in metadata(annotations) for Workflowtemplate CRDs in version 3.4+ of Argo. This change has affected fluentbit's ability to process logs for jobs that specifically utilize Workflowtemplate. Additionally, due to increased metadata for pods, fluentbit now requires a larger buffer size (default 32KB) to temporarily store log data. Solution for the users who faced the issue is just go to fluent bit config and add this in kubernetes filter. Buffer_Size 5MB Its all about metadata, no need to remove any annotations or not even need to make any change from argo end. Simply increase your buffer size

  [FILTER]
        Name kubernetes
        Match argo_job.*
        Labels On
        Annotations On
        Merge_Log On
        Keep_Log On
        Buffer_Size 5MB
        K8S-Logging.Parser On
        K8S-Logging.Exclude Off
        Kube_Tag_Prefix argo_job.var.log.containers.

Note - Its neither fluent bit issue nor argo issue It is where one should fine tune the configuration with respect to such changes.

agilgur5 commented 2 months ago

In my day time I will share you my config for fluent bit.

Note that I did not ask for your fluentbit config, I asked you to tick off the checkboxes I wrote out for you above.

Note - Its neither fluent bit issue nor argo issue It is where one should fine tune the configuration with respect to such changes.

Specifically, this answers the 3rd one.

In the future, I would please ask, once again, that you answer questions from maintainers and follow directions on minimal repros and such. Doing so will also make you a better debugger and perhaps even able to root cause and resolve such issues on your own. That would also be a more efficient and effective use of your time as well as maintainer time, which is especially limited.