Open thangamani-arun opened 1 year ago
part2
time="2022-11-07T09:22:16.500Z" level=warning msg="failed to secure root file handle for 55" error="open /proc/55/root: permission denied"
time="2022-11-07T09:22:16.551Z" level=info msg="secured root for pid 55 root: kubectl (\"/proc/55/root\")"
time="2022-11-07T09:22:16.551Z" level=info msg="mapped container name \"main\" to pid 55"
time="2022-11-07T09:22:17.365Z" level=info msg="Watch pods 200"
time="2022-11-07T09:22:18.182Z" level=info msg="Main container completed"
time="2022-11-07T09:22:18.182Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:18.182Z" level=info msg="No output parameters"
time="2022-11-07T09:22:18.182Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:18.182Z" level=info msg="Killing sidecars []"
time="2022-11-07T09:22:18.183Z" level=info msg="Alloc=6897 TotalAlloc=12064 Sys=75345 NumGC=4 Goroutines=7"
time="2022-11-07T09:22:46.837Z" level=info msg="Starting deadline monitor"
time="2022-11-07T09:22:46.837Z" level=info msg="secured root for pid 53 root: sh (\"/proc/53/root\")"
time="2022-11-07T09:22:46.837Z" level=info msg="mapped container name \"main\" to pid 53"
time="2022-11-07T09:22:47.934Z" level=info msg="Watch pods 200"
time="2022-11-07T09:22:50.221Z" level=info msg="Main container completed"
time="2022-11-07T09:22:50.221Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:50.221Z" level=info msg="No output parameters"
time="2022-11-07T09:22:50.221Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:50.222Z" level=info msg="Killing sidecars []"
time="2022-11-07T09:22:50.222Z" level=info msg="Alloc=6525 TotalAlloc=11818 Sys=74577 NumGC=4 Goroutines=7"
time="2022-11-07T09:22:36.610Z" level=info msg="Starting Workflow Executor" version=v3.4.1
time="2022-11-07T09:22:36.614Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-11-07T09:22:36.614Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo-workflows podName=gitlab-fullstack-wf-6prdv-postgres-cluster-2944012881 template="{\"name\":\"postgres-cluster\",\"inputs\":{\"parameters\":[{\"name\":\"app_name\",\"value\":\"gitlab\"},{\"name\":\"namespace\",\"value\":\"gitlab-dev\"},{\"name\":\"storage_class\",\"value\":\"ceph-xfs\"},{\"name\":\"postgres_db_size\",\"value\":\"10Gi\"},{\"name\":\"postgres_version\",\"value\":\"14\"},{\"name\":\"registry\",\"value\":\"registry:8444\"},{\"name\":\"pgo_wal_archive_timeout\",\"value\":\"60\"},{\"name\":\"pgo_spilo_tag\",\"value\":\"spilo-14:2.1-p5\"},{\"name\":\"backup_schedule\",\"value\":\"*/5 * * * *\"}],\"artifacts\":[{\"name\":\"postgres-cluster\",\"path\":\"gitlab-postgres-cluster.yml\",\"raw\":{\"data\":\"kind: \\\"postgresql\\\"\\napiVersion: \\\"acid.zalan.do/v1\\\"\\nmetadata:\\n name: \\\"automation-gitlab-postgres\\\"\\n namespace: \\\"gitlab-dev\\\"\\n labels:\\n team: automation\\nspec:\\n dockerImage: \\\"registry:8444/private/spilo-14:2.1-p5\\\"\\n teamId: \\\"automation\\\"\\n postgresql:\\n version: \\\"14\\\"\\n parameters:\\n archive_timeout: \\\"60\\\"\\n max_connections: \\\"250\\\"\\n enableLogicalBackup: true\\n logicalBackupSchedule: \\\"*/5 * * * *\\\"\\n numberOfInstances: 3\\n enableConnectionPooler: true\\n volume:\\n size: \\\"10Gi\\\"\\n storageClass: \\\"ceph-xfs\\\"\\n users:\\n gitlab:\\n - superuser\\n - createdb\\n databases:\\n gitlabhq_production: gitlab\\n allowedSourceRanges:\\n # IP ranges to access your cluster go here\\n patroni:\\n initdb:\\n encoding: \\\"UTF8\\\"\\n locale: \\\"en_US.UTF-8\\\"\\n data-checksums: \\\"true\\\"\\n pg_hba:\\n - local all all trust\\n - local replication standby trust\\n - host replication standby all md5\\n - host all all all md5\\n resources:\\n requests:\\n cpu: 400m\\n memory: 1Gi\\n limits:\\n cpu: 800m\\n memory: 1Gi\\n\"}}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"registry:8444/private/argo-executor:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"kubectl --kubeconfig /.kube/config.yaml -n gitlab-dev apply -f gitlab-postgres-cluster.yml\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"kubeconfig-volume\",\"mountPath\":\"/.kube/\"}]}}" version="&Version{Version:v3.4.1,BuildDate:2022-10-01T15:03:42Z,GitCommit:0546fef0b096d84c9e3362d2b241614e743ebe97,GitTag:v3.4.1,GitTreeState:clean,GoVersion:go1.18.6,Compiler:gc,Platform:linux/amd64,}"
time="2022-11-07T09:22:36.615Z" level=info msg="Starting deadline monitor"
time="2022-11-07T09:22:38.616Z" level=info msg="Main container completed" error="<nil>"
time="2022-11-07T09:22:38.616Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:38.616Z" level=info msg="No output parameters"
time="2022-11-07T09:22:38.616Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:38.616Z" level=info msg="Deadline monitor stopped"
time="2022-11-07T09:22:38.617Z" level=info msg="Alloc=7167 TotalAlloc=12564 Sys=28882 NumGC=4 Goroutines=5"
time="2022-11-07T09:21:55.910Z" level=warning msg="failed to secure root file handle for 60" error="open /proc/60/root: permission denied"
time="2022-11-07T09:21:55.961Z" level=info msg="secured root for pid 60 root: kubectl (\"/proc/60/root\")"
time="2022-11-07T09:21:55.961Z" level=info msg="mapped container name \"main\" to pid 60"
time="2022-11-07T09:21:56.797Z" level=info msg="Watch pods 200"
time="2022-11-07T09:21:58.610Z" level=info msg="Main container completed"
time="2022-11-07T09:21:58.610Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:21:58.610Z" level=info msg="No output parameters"
time="2022-11-07T09:21:58.610Z" level=info msg="No output artifacts"
time="2022-11-07T09:21:58.610Z" level=info msg="Killing sidecars []"
time="2022-11-07T09:21:58.611Z" level=info msg="Alloc=6466 TotalAlloc=11670 Sys=75857 NumGC=4 Goroutines=7"
time="2022-11-07T09:22:26.273Z" level=warning msg="failed to secure root file handle for 60" error="open /proc/60/root: permission denied"
time="2022-11-07T09:22:26.325Z" level=info msg="secured root for pid 60 root: sh (\"/proc/60/root\")"
time="2022-11-07T09:22:26.325Z" level=info msg="mapped container name \"main\" to pid 60"
time="2022-11-07T09:22:27.137Z" level=info msg="Watch pods 200"
time="2022-11-07T09:22:28.958Z" level=info msg="Main container completed"
time="2022-11-07T09:22:28.959Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:28.959Z" level=info msg="No output parameters"
time="2022-11-07T09:22:28.959Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:28.959Z" level=info msg="Killing sidecars []"
time="2022-11-07T09:22:28.959Z" level=info msg="Alloc=6956 TotalAlloc=12499 Sys=75857 NumGC=4 Goroutines=7"
time="2022-11-07T09:22:05.891Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-11-07T09:22:05.892Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo-workflows podName=gitlab-fullstack-wf-6prdv-argocd-project-3910150959 template="{\"name\":\"argocd-project\",\"inputs\":{\"parameters\":[{\"name\":\"app_name\",\"value\":\"gitlab\"},{\"name\":\"namespace\",\"value\":\"gitlab-dev\"},{\"name\":\"registry\",\"value\":\"registry:8444\"},{\"name\":\"argocd_url\",\"value\":\"argo-cd-argocd-server.argocd.svc\"},{\"name\":\"argocd_user\",\"value\":\"user\"},{\"name\":\"argocd_password\",\"value\":\"password\"},{\"name\":\"argo_cd_k8s_target\",\"value\":\"https://kubernetes.default.svc\"}],\"artifacts\":[{\"name\":\"argocd-project\",\"path\":\"argocd-project.yml\",\"raw\":{\"data\":\"apiVersion: argoproj.io/v1alpha1\\nkind: AppProject\\nmetadata:\\n name: gitlab-dev\\n namespace: argocd\\nspec:\\n sourceRepos:\\n - '*'\\n destinations:\\n - namespace: gitlab-dev\\n server: https://kubernetes.default.svc\\n clusterResourceWhitelist:\\n - group: '*'\\n kind: '*'\\n\"}}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"registry:8444/private/argo-executor:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"argocd login argo-cd-argocd-server.argocd.svc --grpc-web --username user --password password --insecure --plaintext; argocd proj create --upsert -f argocd-project.yml\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"kubeconfig-volume\",\"mountPath\":\"/.kube/\"}]}}" version="&Version{Version:v3.4.1,BuildDate:2022-10-01T15:03:42Z,GitCommit:0546fef0b096d84c9e3362d2b241614e743ebe97,GitTag:v3.4.1,GitTreeState:clean,GoVersion:go1.18.6,Compiler:gc,Platform:linux/amd64,}"
time="2022-11-07T09:22:05.892Z" level=info msg="Starting deadline monitor"
time="2022-11-07T09:22:08.894Z" level=info msg="Main container completed" error="<nil>"
time="2022-11-07T09:22:08.894Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:08.894Z" level=info msg="No output parameters"
time="2022-11-07T09:22:08.894Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:08.895Z" level=info msg="Deadline monitor stopped"
time="2022-11-07T09:22:08.895Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2022-11-07T09:22:08.895Z" level=info msg="Alloc=6400 TotalAlloc=12571 Sys=25298 NumGC=4 Goroutines=5"
time="2022-11-07T09:22:15.976Z" level=info msg="Starting Workflow Executor" version=v3.4.1
time="2022-11-07T09:22:15.979Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-11-07T09:22:15.980Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo-workflows podName=gitlab-fullstack-wf-6prdv-pgo-configmap-746054629 template="{\"name\":\"pgo-configmap\",\"inputs\":{\"parameters\":[{\"name\":\"namespace\",\"value\":\"gitlab-dev\"},{\"name\":\"pgo_configmap_name\",\"value\":\"postgres-pod-config\"},{\"name\":\"registry\",\"value\":\"rehgistry:8444\"},{\"name\":\"ceph_s3_endpoint\",\"value\":\"https://xx.xx.xx.xx\"},{\"name\":\"ceph_s3_access_key_id\",\"value\":\"AAAAA\"},{\"name\":\"ceph_s3_access_key\",\"value\":\"AAAA\"},{\"name\":\"backup_num_to_retain\",\"value\":\"14\"},{\"name\":\"backup_schedule\",\"value\":\"*/5 * * * *\"}],\"artifacts\":[{\"name\":\"pgo-configmap\",\"path\":\"pgo-configmap.yml\",\"raw\":{\"data\":\"apiVersion: v1\\nkind: ConfigMap\\nmetadata:\\n name: postgres-pod-config\\n namespace: gitlab-dev\\ndata:\\n AWS_ENDPOINT: https://xx.xx.xx.xx\\n AWS_REGION: default\\n AWS_S3_FORCE_PATH_STYLE: \\\"true\\\"\\n AWS_ACCESS_KEY_ID: AAAAAAA\\n AWS_SECRET_ACCESS_KEY: AAAA\\n BACKUP_SCHEDULE: \\\"*/5 * * * *\\\"\\n BACKUP_NUM_TO_RETAIN: \\\"14\\\"\\n WALG_DISABLE_S3_SSE: \\\"true\\\"\\n USE_WALG_BACKUP: \\\"true\\\"\\n USE_WALG_RESTORE: \\\"false\\\"\\n\\n STANDBY_AWS_ENDPOINT: https://xx.xx.xx.xx\\n STANDBY_AWS_REGION: default\\n STANDBY_AWS_ACCESS_KEY_ID: AAAA\\n STANDBY_AWS_SECRET_ACCESS_KEY: AAA\\n STANDBY_AWS_S3_FORCE_PATH_STYLE: \\\"true\\\"\\n STANDBY_WALE_ENV_DIR: /run/etc/wal-e.d/env-standby\\n STANDBY_USE_WALG_RESTORE: \\\"true\\\"\\n STANDBY_WALG_DISABLE_S3_SSE: \\\"true\\\"\\n\"}}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"private-registry:8444/private/argo-executor:latest\",\"command\":[\"/bin/sh\",\"-c\"],\"args\":[\"kubectl --kubeconfig /.kube/config.yaml apply -f pgo-configmap.yml\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"kubeconfig-volume\",\"mountPath\":\"/.kube/\"}]}}" version="&Version{Version:v3.4.1,BuildDate:2022-10-01T15:03:42Z,GitCommit:0546fef0b096d84c9e3362d2b241614e743ebe97,GitTag:v3.4.1,GitTreeState:clean,GoVersion:go1.18.6,Compiler:gc,Platform:linux/amd64,}"
time="2022-11-07T09:22:15.980Z" level=info msg="Starting deadline monitor"
time="2022-11-07T09:22:18.980Z" level=info msg="Main container completed" error="<nil>"
time="2022-11-07T09:22:18.980Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-11-07T09:22:18.980Z" level=info msg="No output parameters"
time="2022-11-07T09:22:18.980Z" level=info msg="No output artifacts"
time="2022-11-07T09:22:18.981Z" level=info msg="Alloc=7427 TotalAlloc=12500 Sys=24018 NumGC=3 Goroutines=7"
the same workflow was failed before above run,
STEP TEMPLATE PODNAME DURATION MESSAGE
✖ gitlab-fullstack-wf-2k6lb gitlab-fullstack-wf child 'gitlab-fullstack-wf-2k6lb-917790860' failed
├───✔ create-namespace namespace gitlab-fullstack-wf-2k6lb-namespace-1622150166 4s
├───✔ create-argocd-project argocd-project gitlab-fullstack-wf-2k6lb-argocd-project-972177293 7s
├───✔ create-pgo-configmap pgo-configmap gitlab-fullstack-wf-2k6lb-pgo-configmap-1718522638 4s
├───✔ deploy-postgres-operator postgres-operator gitlab-fullstack-wf-2k6lb-postgres-operator-4015549576 6s
└───✖ deploy-postgres-cluster postgres-cluster gitlab-fullstack-wf-2k6lb-postgres-cluster-1514961193 5s Error (exit code 1)
after 2nd run it errored with expected < 2 pods, got 2 - this is a bug
.
I have encountered this kind of once in a while.
I have heard from https://github.com/argoproj/argo-workflows/issues/9839#issuecomment-1281999154 comment that upgrading to v3.4.1 will solve, but I encountered this is in v3.4.1.
so wondering what is causing the failure ?
@sarabala1979 @alexec: your help would be greatly appreciated.
Can you try v3.4.3?
It occurs here:
Essentially, there are two pods saying they’re running the work for the same node.
This func could be simplified to return objectCount > 0
.
Would someone like to submit a PR?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
I'm observing this issue with 3.4.7
I have the same problem and the culprit is a rogue workflow-controller, e.g.
$ kubectl get pod -A|grep workflow
argo-workflows argo-workflows-server-77645f86cc-fhkp4 1/1 Running 0 9d
argo-workflows argo-workflows-server-77645f86cc-hb266 1/1 Running 0 9d
argo-workflows argo-workflows-server-77645f86cc-x5h4r 1/1 Running 0 9d
argo-workflows argo-workflows-workflow-controller-7674d6687-4hsfj 1/1 Running 0 9d
argo-workflows argo-workflows-workflow-controller-7674d6687-jxbbl 1/1 Running 0 9d
argo-workflows argo-workflows-workflow-controller-7674d6687-kf2wc 1/1 Running 0 9d
argo workflow-controller-845d5d67f4-tqktq 1/1 Running 0 14d
The rogue one is the controller inside the namespace argo
, remove the deployment/workflow-controller
fixes the problem.
YMMV but it's worth to verify if there is any workflow controller outside of your install
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
it's worth to verify if there is any workflow controller outside of your install
Had a colleague hit this today and this was indeed the cause -- two controllers running.
I opened #13760 to call that out as a likely culprit.
Pre-requisites
:latest
What happened/what you expected to happen?
Version
v3.4.1
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
part1