CitC is broken on Oracle because Oracle Linux has updated to a broken ansible

This is more for info than anything else... plus is some useful rubber ducking / help for anyone else encountering the same issue.

The run_ansible script exits with errors when building clusters on Oracle Linux. The issue is that the ansible package was updated in May 2022, and this now uses Python 3.8. Somehow the packaging of Ansible on Oracle Linux fails to include various important module, e.g. python-ldap, PyMySQL amongst others. This means that run_ansible is unable to run and exits with an error. This is now caught by the finish script, which gives the impression that the cluster is still being built, and so installation appears to hang.

I tried downgrading ansible to the old version, but this appears to have disappeared from ol8-appstream, and my attempt to install from epel (by disabling ol8-appstream) failed because of unresolvable dependencies.

The solution I found was, as root, to yum remove ansible and then run pip3.6 install ansible. This installed ansible against python 3.6 (which has all of the required modules) and placed the result in /usr/local/bin. I then updated the path in the run_ansible script to /usr/local/bin/ansible-playbook and this completed every stage but one. The security-updates role failed with the error

TASK [security_updates : Install security updates] *****************************
Wednesday 06 July 2022  16:53:26 +0000 (0:00:00.746)       0:05:19.368 ******** 
fatal: [mgmt.subnet.sharpphoenix.oraclevcn.com]: FAILED! => changed=true 
  cmd:
  - dnf
  - update
  - -y
  - --security
  - --exclude
  - kernel*
  - --exclude
  - slurm*
  delta: '0:00:05.836997'
  end: '2022-07-06 16:53:32.741248'
  msg: non-zero return code
  rc: 1
  start: '2022-07-06 16:53:26.904251'
  stderr: |-
    Error:
     Problem 1: package shim-x64-15.6-1.0.3.el8.x86_64 requires oracle(kernel-sig-key) >= 202204, but none of the providers can be installed
      - cannot install the best update candidate for package shim-x64-15.3-1.0.3.x86_64
      - package kernel-4.18.0-372.13.1.0.1.el8_6.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.307.3.6.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.308.7.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.308.9.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-debug-5.4.17-2136.307.3.6.el8uek.x86_64 is filtered out by exclude filtering
     Problem 2: problem with installed package shim-x64-15.3-1.0.3.x86_64
      - package grub2-efi-x64-1:2.02-123.0.4.el8_6.8.x86_64 conflicts with shim-x64 <= 15.3-1.0.3 provided by shim-x64-15.3-1.0.3.x86_64
      - package shim-x64-15.6-1.0.3.el8.x86_64 requires oracle(kernel-sig-key) >= 202204, but none of the providers can be installed
      - cannot install the best update candidate for package grub2-efi-x64-1:2.02-123.0.1.el8.x86_64
      - package kernel-4.18.0-372.13.1.0.1.el8_6.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.307.3.6.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.308.7.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-5.4.17-2136.308.9.el8uek.x86_64 is filtered out by exclude filtering
      - package kernel-uek-debug-5.4.17-2136.307.3.6.el8uek.x86_64 is filtered out by exclude filtering
  stderr_lines: <omitted>
  stdout: |-
    Last metadata expiration check: 0:02:59 ago on Wed 06 Jul 2022 04:50:29 PM GMT.
    (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
  stdout_lines: <omitted>

I removed this role (it is just a security update..!) and re-ran...

This now progressed all the way to the end, and I am just waiting for packer to finish creating the image...

...and now it has all completed. The cluster appears to be working. I'll add more if I find any other issues.

clusterinthecloud / support

CitC is broken on Oracle because Oracle Linux has updated to a broken ansible #45