ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.08k stars 3.42k forks source link

SCM Projects: keep latest pulled version of code if sync fails #13127

Open anasaizg opened 2 years ago

anasaizg commented 2 years ago

Please confirm the following

Feature type

Enhancement to Existing Feature

Feature Summary

Add an option that allows project admins to select if they would like to use the previously pulled code if a SCM sync fails.

Select the relevant components

Steps to reproduce

Currently, if a SCM project fails synchronization for any reason (e.g., SCM is unavailable) it seems to lose the previously pulled code, so the job templates that use the playbooks in that repository fail.

If you have selected to synchronize the project before every job launch or you have it scheduled, there are jobs scheduled for the weekend, for example, and then the sync fails, all scheduled jobs are lost.

Current results

If the SCM sync fails, all launched job templates with playbooks in that project will fail.

Sugested feature result

If option is selected, job templates will use the latest pulled playbooks. It might not be the latest code pushed to SCM, but at least the jobs would not fail if SCM is not available.

Additional information

No response

AlexSCorey commented 2 years ago

Thanks for submitting this issue. Can you please share what version of awx you are using?

In the mean time you could copy your project, and update 1 of the 2 copies of the project. Then, create another job template that uses the copied project. Then, you could create a workflow that defaults to the job template with the latest sync, and if that job fails, the workflow would go onto the other job template that uses the project with the old sync version to execute the jobs. This is obviously not ideal but might be a workaround to be used for now.

anasaizg commented 2 years ago

Hi Alex! Thanks for your reply. We have two environments and we are running versions 15 and 21.0.0.

To avoid this, for the moment we are not synchronizing the projects automatically.

AlexSCorey commented 2 years ago

@anasaizg I'm not sure I understand.

Are 15 and 21.0.0 the versions of awx you are running?

The steps I suggested above do require some manual work to keep track of the versions you want so that you have a fallback and so if you want to update the sync versions on a schedule it won't work. However, if you want to run a job on a schedule are you are ok with using the same sync version until you are prepared to update the sync versions manually the steps above should work.

anasaizg commented 2 years ago

Hi @AlexSCorey! yes, we have two environments, we had been using version 15 on both of them. We migrated one of them to version 21.0.0 but as we faced other issues (for instance, the reported problem with jobs that take longer than 4 hours) we have not migrated the other one yet.

I will try your suggestion for some of the projects that might need more frequent synchronizations, thanks!!

AlanCoding commented 2 years ago

If option is selected, job templates will use the latest pulled playbooks. It might not be the latest code pushed to SCM, but at least the jobs would not fail if SCM is not available.

This is a very coherent request to me, because it fits the data model and machinery that's happening under the hood.

Yes, there are some qualifiers to this. If you are on a clustered system, then the control node for the job may not have the source tree (or Galaxy content) available on it. If the SCM remote is down, then implementing this would turn one type of failure (a dependent job failure) into another type of failure (from the project sync, which is also a dependency, but some minor differences).

To have both update-on-launch behavior and SCM downtime tolerance we would need https://github.com/ansible/awx/issues/289. I mentioned in there that we now allow running offline, that isn't inconsistent with the claims here, because I meant running offline without update-on-launch involved. Doing the fan-out proposal would be a fairly big feature.

Nonetheless, a simple check box on the project menu would solve the problem virtually completely for standalone deployments. @relrod has done some work lately to assure consistency of failures when update-on-launch dependency failures occur. If we were to conditionally allow failures based on a setting of some type, that should be fairly straightforward technically.