Open joffreychambrin opened 2 weeks ago
What kind of hook are you using? A Job
? Looking at the code, there is something wrong with the health check for the hook. Are you using custom health checks as Lua scripts?
I think https://github.com/argoproj/argo-cd/blob/v2.11.0/controller/hook.go just misses necessary nil-checks after calling GetResourceHealth
. Every other caller of GetResourceHealth
(both in argo-cd
and in gitops-engine
) checks the returned health
object for nil
, but not the post-delete hook controller. The returned health object can indeed be nil.
@joffreychambrin You can probably workaround the bug by adding this to your argocd-cm configmap:
resource.customizations: |
batch/Job:
health.lua: |
hs = {}
hs.status = "Progressing"
hs.message = ""
if obj.status ~= nil then
if obj.status.conditions ~= nil then
for i, condition in ipairs(obj.status.conditions) do
if condition.type == "Complete" and condition.status == "True" then
hs.status = "Healthy"
return hs
end
if condition.type == "Failed" and condition.status == "True" then
hs.status = "Degraded"
return hs
end
end
end
end
return hs
Completely untested, and I would advise against it. The idea here is to guarantee a non-nil hookHealth
object in https://github.com/argoproj/argo-cd/blob/d3f33c00197e7f1d16f2a73ce1aeced464b07175/controller/hook.go#L101 to workaround the missing nil
check there.
@joffreychambrin The unit test introduced in https://github.com/argoproj/argo-cd/pull/16595 only tests this feature by using a bare Pod
. So it would probably also work when you replace your Job
with a bare Pod
.
@alexmt Maybe you want to know about this :)
@ChristianCiach Yes the hook I am using is a job. It is the default one from Teleport here: https://github.com/gravitational/teleport/blob/70ba6be2eac4f8e5c275ffac6246197ef798392a/examples/chart/teleport-kube-agent/templates/delete_hook.yaml#L47
I think you are right in your analysis, because as you can see in the above link, there are multiple resources with the post-delete
hook: a service-account, a role, a roleBinding, and a job. And not all of them have a healthCheck, therefore the necessity to have a nil check on the hookHealth
Checklist:
argocd version
.Describe the bug
Since we've upgrade to to ArgoCD v2.10.0, we have a specific application that gets stuck in Deletion. The only solution I have found is to manually remove the finalizers from the App.
To Reproduce
Expected behavior
The post hook delete job will be launch, and once completed the app should be automatically deleted from ArgoCD.
Version
Logs