Open remicres opened 9 months ago
Looking in the artifact gc code, I saw this line here
...
RunAsUser: pointer.Int64Ptr(8737),
...
And if we take a look at the error message reported in the issue, it looks like OpenShift wants an user id must in a certain range:
"failed to GC artifacts"
error="failed to create pod:
pods "artifact-gc-nn4dj-artgc-wfcomp-3200369636" is forbidden:
unable to validate against any security context constraint: [
provider "anyuid": Forbidden: not usable by user or serviceaccount,
provider restricted-v2: .containers[0].runAsUser: Invalid value: 8737: must be in the ranges: [1000710000, 1000719999],
...]" namespace=argo workflow=artifact-gc-nn4dj
Could the GC pod UID be the cause?
You should be able to use the podSpecPatch as part of https://argoproj.github.io/argo-workflows/fields/#workflowlevelartifactgc to modify this and prove your idea.
Thanks, I'll try this asap. First I need to update Argo (I have version 3.4.13, unfortunately for me podSpecPatch
and forceFinalizerRemoval
come with 3.5.0). I will keep you updated
After upgrading Argo-Workflows to 3.5.2, and applying podSpecPatch like this:
...
artifactGC:
strategy: OnWorkflowDeletion
forceFinalizerRemoval: true
podSpecPatch: '{"containers":[{"name":"main", "securityContext":{"runAsUser":1000710000}}]}'
...
I still can't make GC working, and I still have to delete manually the finalizer (forceFinalizerRemoval
does not seem to work).
The error is different though:
(controller logs)
...
time="2024-01-02T19:23:23.578Z" level=info msg="Creating Artifact GC Task myarticho-artgc-wfcomp-2166136261-0" namespace=argo workflow=myarticho
time="2024-01-02T19:23:23.610Z" level=info msg="creating pod to delete artifacts: myarticho-artgc-wfcomp-2166136261" namespace=argo strategy=OnWorkflowCompletion workflow=myarticho
time="2024-01-02T19:23:23.614Z" level=error msg="failed to GC artifacts" error="failed to create pod: pods \"myarticho-artgc-wfcomp-2166136261\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>" namespace=argo workflow=myarticho
...
I have checked the ClusterRole, and I have the following (which looks fine?):
rules:
- apiGroups:
- ""
resources:
- pods
- pods/exec
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- watch
- list
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumeclaims/finalizers
verbs:
- create
- update
- delete
- get
- apiGroups:
- argoproj.io
resources:
- workflows
- workflows/finalizers
- workflowtasksets
- workflowtasksets/finalizers
- workflowartifactgctasks
verbs:
- get
- list
- watch
- update
- patch
- delete
- create
- apiGroups:
- argoproj.io
resources:
- workflowtemplates
- workflowtemplates/finalizers
- clusterworkflowtemplates
- clusterworkflowtemplates/finalizers
verbs:
- get
- list
- watch
- apiGroups:
- argoproj.io
resources:
- workflowtaskresults
verbs:
- list
- watch
- deletecollection
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
- list
- apiGroups:
- argoproj.io
resources:
- cronworkflows
- cronworkflows/finalizers
verbs:
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- create
- get
- delete
Now I really don't see why
forceFinalizerRemoval
doesn't work after the GC fails,workflows/finalizers
have the neccessary verbs (of which I am aware of?)
Pre-requisites
:latest
What happened/what did you expect to happen?
Hi,
Artifacts repository works fine, except that ArtifactGC does not work. Besides, workflows status are "succeed" but they have to be deleted manually setting the
metadata.finalizers
to[]
, else the deletion deadlocks.The server logs mention some cluster permissions issue (see logs from workflow controler).
I am new to argo workflow, and I could be wrong here, but it would not be the first time I encounter such permission-related issues on openshift
Version
v3.4.13
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container