ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.06k stars 3.42k forks source link

Cleanup image pull secrets when registry cred is deleted #10238

Open rooftopcellist opened 3 years ago

rooftopcellist commented 3 years ago

Follow-up this PR: https://github.com/ansible/awx/pull/10204 Follow-up to this Issue: https://github.com/ansible/awx/issues/10114

ISSUE TYPE
SUMMARY

Currently, we create image pull secrets in the namespace of the cluster defined in the Container Group. These secrets remain after job runs and get re-used. This is fine, the issue is when a registry credential is deleted. In this case, we should delete all associated cluster secrets.

To do this, we need to:

  1. Get a list of container groups
  2. Iterate over the EE's and gather the ones that use that quay credential
  3. Iterate over the JT's and get the ones that use those EE's. Create a unique list of the Instance groups (container groups) for these JT's
  4. Reconstruct the secret name, then attempt to delete it using the credential for that instance_group.

For example:

from django.conf import settings
automation-{0}-image-pull-secret-{1}.format(settings.INSTALL_UUID[:5], registry_cred.id

Alternatively, we could try to maintain a list of known pull secrets and delete from that list...

This could be done here in the credential destroy() method.

ENVIRONMENT
STEPS TO REPRODUCE
  1. Create valid openshift/k8s credential
  2. Create valid registry credential
  3. Create an Execution environment that uses a protected registry and specify the registry credential
  4. Create a JT that uses them
  5. Run a job
  6. Observe an image pull secret has now been created in the openshift/k8s cluster.
  7. Delete the registry credential
EXPECTED RESULTS

The image pull secret should be deleted.

ACTUAL RESULTS

Currently, the secret must be deleted manually. Secret will remain.

AlanCoding commented 3 years ago

If done in the Credential delete method, we may need to put it in a task if this might take a long time. But I'm not sure.

kdelee commented 3 years ago

We discussed the potential for a reaper. @tvo318 we should add a Known Issue for 4.0:

"Image pull secrets created by Tower will remain in Container Group namespaces after running jobs that use Execution Environments that use Container Registry credentials."

shanemcd commented 2 years ago

There is a PR for this, but not sure it's the best solution. Instead of doing this as a one-off background task, it would be best to have a reaper like @kdelee mentioned.

rooftopcellist commented 2 years ago

@amolgautam25 Here are some notes of how to test this:

First, you'll need a k8s or openshift cluster and creds for it.

If memory serves, then you'll need to:

  1. Create a k8s or openshift credential on Controller/AWX (can be your local dev instance) - see official tower/controller docs for how to do this
  2. Create a ContainerGroup with that k8s credential specified
  3. Create pull secret for a private registry (you can make your own on quay, then pull an EE, re-tag it to something unique like amol-ee-test , then push to your private quay. Create the secret by creating a bot/service account in quay)
  4. Change the Default Job Template to use the ContainerGroup you just created (instead of the default instance group). You will also need to add the EE to the JT.
  5. Run the job
  6. Monitor the k8s/openshift cluster to see that the secret gets created.