ansible / awx-resource-operator

41 stars 34 forks source link

TTL Controller for Finished Resources causes AnsibleJobs to run repeatedly #49

Closed gparvin closed 2 years ago

gparvin commented 3 years ago

In OCP 4.8 https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/ is enabled by default. This means jobs get deleted after an hour. When the job gets deleted, it seems the AnsibleJob wants to re-run the automation again. This causes the automation to run hourly.

The feature gate for TTL Controller for Finished Resources can be disabled which I think will prevent this from happening. I'm trying that out now. To disable you do this:

Run: oc edit kubeapiservers.operator.openshift.io cluster and then update the unsupportedConfigOverrides

  unsupportedConfigOverrides: 
    apiServerArguments:
      feature-gates:
      - TTLAfterFinished=false

It would be nice if the AnsibleJob status could be used to prevent the re-run to allow job cleanup to work.

gparvin commented 3 years ago

Just updating the apiserver didn't seem to fix it. Trying oc edit kubecontrollermanager cluster and making the same update there now. After making the update in both I'm not seeing this problem anymore.

elgnay commented 2 years ago

Is there a plan to fix this issue? It is really annoyed because running the automation repeatedly may result in problems.

rooftopcellist commented 2 years ago

I am working on testing a patch now.

rooftopcellist commented 2 years ago

Information from my initial investigation of this issue: https://gist.github.com/rooftopcellist/f521164446ab554a1d13410138986549