IDR / deployment

Deployment infrastructure for the Image Data Resource
https://idr.openmicroscopy.org/about/deployment.html
BSD 2-Clause "Simplified" License
12 stars 14 forks source link

Reconsider systematic usage of update_cache in Ansible roles #433

Open sbesson opened 1 month ago

sbesson commented 1 month ago

Noticed during the deployment of the software changes from https://github.com/IDR/deployment/pull/429 to prod122

As part of the migration of the OME Ansible roles to support RHEL9 started ~12 months ago, all usages of the built-in yum Ansible module have been replaced with the built-in dnf module. The update_cache parameter has been set to true across the board.

A consequence of this decision is a systematic and significant increase of the deployment time. As a minimum example, I executed the the idr-read-only.yml playbook against the test123 deployment in three consecutive runs.

With the current Ansible roles defined in ansible/requirements.yml, the playbook ran to completion in 6:02.90, 8:03.52 and 12:00.08.

I modified the Ansible roles downloaded locally via Galaxy to disable the cache update:

find vendor -type f -exec  sed -e "s/update_cache: true/update_cache: false/g" -i '' {} \;

With these changes, the playbook ran to completion in 2:35.17, 2:20.43 and 2:29.84 respectively.

As shown by the measurements above, the repeated calls to updating the cache for every DNF operation are causing a massive degradation in the execution times of our playbooks. While IDR has the most regular exposure due to the frequent deployments, this will affect anyone using the OME Ansible infrastructure on RHEL 9 including the UoD production deployments /cc @pwalczysko

Unless there is a rationale for keeping update_cache parameter and for make it configurable, my suggestion would be to remove it and release all the Ansible roles.

/cc @jburel @khaledk2