Open javree opened 10 months ago
I assuem you just executed ansible-playbook compute-redhat.yml
?
I just reinstalled Rocky 8.9 and rolled out controller.yml and compute-redhat.yml and they work here.
PLAY RECAP ****************************************************************************************************************************************************************
compute.osimages.luna : ok=124 changed=74 unreachable=0 failed=0 skipped=136 rescued=0 ignored=1
controller1 : ok=55 changed=20 unreachable=0 failed=0 skipped=33 rescued=0 ignored=0
Do you have python3-dnf
packages installed, e.g.:
rpm -qa | grep python3-dnf
python3-dnf-plugin-versionlock-4.0.21-23.el8.noarch
python3-dnf-4.7.0-19.el8.noarch
python3-dnf-plugins-core-4.0.21-23.el8.noarch
I have exactly those RPM's installed : python3-dnf-plugin-versionlock-4.0.21-23.el8.noarch python3-dnf-4.7.0-19.el8.noarch python3-dnf-plugins-core-4.0.21-23.el8.noarch
Indeed executed that exact command. I've just ran ansible-playbook -vvvv compute-redhat.yml >> log.txt 2>&1 and attached it's output , as well as a full rpm list log.txt rpmlist.txt
Hi @javree , for some reason it picked up 3.6 in the image. The python36 is pulled in by gdm, OpenHPC and OOD. Was this your second ansible run? Did you do anything with your environment (env/set)?
Can you do
ansible --version | grep python
Can you also retry the run by adding the following to ansible.cfg?
interpreter_python=/usr/bin/python3.11
@javree did the line fixed it for the compute-redhat.yml
?
Unfortunately no ; since you mentioned it might have something to do with running the playbook multiple times, I am underway fully redeploying the controller and start fresh.
Unfortunately no ; since you mentioned it might have something to do with running the playbook multiple times, I am underway fully redeploying the controller and start fresh.
So what you're basically telling us is that the Ansible playbooks are not idempotent?
I wonder if starting from fresh solved the issue 🤔
Will report next week, how a fresh install went
Sorry for the delay in getting back. Did a full reinstall of the controller from a Rocky 8.9 USB key , ran through the procedure again and changed nothing else. Yet again exactly the same issue ... I have not touched the compute-redhat.yml file in any way
Note: On a default Rocky8 install python3.6 is the system default. Ansible on Rocky8 now uses python3.11 but for python3.11 there is no python3.11-dnf package so adding python3.11 will break things elsewhere... I'm seriously wondering how this can work at all
Just for giggles I tried the compute-ubuntu playbook and that completed fine, so I can at least boot a node soon hopefully... But the issue regarding ansible using python 3.11 vs dnf using python 3.6 remains when trying to build a RHEL image
I've tried this again, but this time using Rocky Linux 9.3 on the controller and there all appears to work just fine.
@javree thank you for your feedback! I think for the 8.x we had a fix but I'll need to double check that.
There have been quite a few changes in how we prepare (install) the ansible environment before running the playbook. Though these have not been pushed to github yet, i expect (hope) that these issues will belong to the past. Our target for pushing is in about 2-3 weeks from today. We are finalizing the new monitoring stack and H/A.
Latest greatest has been pushed.
Very happy to report that with the new release all is well on Rocky 8 as well !
Hate to reopen this ...
Did a fresh checkout, machine fully up to date Rocky 8.9 Running
marclus0 18:43:54 [root@marclus0 site]# ansible-playbook compute-redhat.yml
Gives me
TASK [trix-tree : Create Trinity H/A directory structure on controllers] **** skipping: [compute.osimages.luna]
TASK [init : Install init packages] *****
failed: [compute.osimages.luna] (item=python3-libselinux) => {"ansible_loop_var": "item", "changed": false, "item": "python3-libselinux", "msg": "Could not import the dnf python module using /usr/libexec/platform-python (3.6.8 (default, Apr 24 2024, 21:55:04) [GCC 8.5.0 20210514 (Red Hat 8.5.0-22)]). Please install python3-dnf
or python2-dnf
package or ensure you have specified the correct ansible_python_interpreter. (attempted ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python2', '/usr/bin/python'])", "results": []}
PLAY RECAP ** compute.osimages.luna : ok=3 changed=1 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0 controller1 : ok=59 changed=20 unreachable=0 failed=0 skipped=43 rescued=0 ignored=0
marclus0 18:55:33 [root@marclus0 site]#
Again the conflict between python 3.6 (system default) and the ansible python 3.11
... one thing truly amazes me every time how something, supposedly be 'generic' like a Rocky install (or redhat, or alma, or...) can be so much different anywhere in the world.... I'll get back to you as rocky 8.10 (which i've done more than 10 installs today alone), all work as expected. Not sure if rocky 8.9 is now deviating? Last week 8.9 was also just fine... It truly amazes me..... -A
a hint - as I encountered the same issue today. Verify your subscription within the image.
Also, within the image, run a watch -n 0.1 "cat /etc/yum.repos.d/redhat.repo"
(Using Red Hat instead)
What happens within rhel 8.10 is in regards to the default baseurl within /etc/rhsm/rhsm.conf Somewhere down the line it starts to redirect to cdn.redhat.com within the redhat.repo instead of our own satellite server.
Changed the rhsm.conf baseurl to our own satellite server, and it 'stopped' changing to cnd.redhat.com resulting in installing the correct packages.
To test - try installing python3-libselinux manually within the the image, before and after 'fixing' the subscription
note: redhat.repo is configured correctly at task
OK at TASK [trinity/image-create : Install redhat-release package in /trinity/images/compute] *******************************************************************************
But right before/during installing the external RPM packages tasks, redhat.repo gets 'overruled' by rhsm.conf
Second: please use python_interpreter=/usr/libexec/platform-python
within ansible.cfg
It resolves allot of issues with red hat at least. Including this issue (+ the above solution);
last note: the controller has a different range of supported Python interpreters than the targets
and That's why you will also have problems on rhel8 if you use Ansible 2.17
(I've used ansible 2.15.x on the controller instead)
Following the install guide at https://docs.clustervision.com/install/install/ on Rocky Linux 8.9 Controller install went fine, ansible finished without issues However image creation fails :
TASK [init : Install init packages] **** failed: [compute.osimages.luna] (item=python3-libselinux) => {"ansible_loop_var": "item", "changed": false, "item": "python3-libselinux", "msg": "Could not import the dnf python module using /usr/libexec/platform-python (3.6.8 (default, Jan 15 2024, 23:09:02) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)]). Please install
python3-dnf
orpython2-dnf
package or ensure you have specified the correct ansible_python_interpreter. (attempted ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python2', '/usr/bin/python'])", "results": []}PLAY RECAP ***** compute.osimages.luna : ok=3 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
controller1 : ok=52 changed=5 unreachable=0 failed=0 skipped=34 rescued=0 ignored=0
[root@marclus0 site]# cat /etc/redhat-release Rocky Linux release 8.9 (Green Obsidian)
[root@marclus0 site]# rpm -qa | grep -i ansible ansible-8.3.0-1.el8.noarch ansible-core-2.15.3-1.el8.x86_64
We've not edited anything in the playbook