clustervision / trinityX

TrinityX is the new generation of ClusterVision's open-source HPC, A/I and cloudbursting platform. It is designed from the ground up to provide all services required in a modern HPC and A/I system, and to allow full customization of the installation.
GNU General Public License v3.0
67 stars 37 forks source link

Rocky 8 image creation problem with random_device #427

Closed mulderij closed 2 months ago

mulderij commented 3 months ago

While running playbook compute-default.yml (https://github.com/clustervision/trinityX/commit/0a847796949ab0645e8a194a9beda21b67258da1) for a Rocky 8 image (without OpenOnDemand) on RHEL9 controller I receive the following error:

PLAY [base-rocky8.osimages.luna]
...
TASK [init : Install init packages] **********************************************************************************************************************************************************************************************************
failed: [base-rocky8.osimages.luna] (item=python3-libselinux) => {"ansible_loop_var": "item", "changed": false, "item": "python3-libselinux", "msg": "random_device::random_device(const std::string&)", "rc": 1, "results": []}

It appears there are no basic '/dev' files in the image A work-around is to chroot into the image on another console or create the mount by hand. This will also mount '/proc' and '/sys' in case that would be useful.

During the creation of a RHEL9 image on RHEL9 controller these are created with roles/trinity/image-create/tasks/redhat/base.yml e43803d started from site/roles/trinity/image-create/tasks/main.yml d432ae9 in the Creating base image task. This step is skipped in the creation of Rocky8. Should these /dev files have been part of the downloaded base image? Or should the playbook basic /dev files be created in a different part of the playbook (which is run for both plays)?

aphmschonewille commented 3 months ago

this one is odd as for both Rocky and Rhel basically the same tasks are executed. I'll simulate to create RL8 image on RHEL9 and see if we encounter the same problem.

aphmschonewille commented 3 months ago

just to make sure i am not looking at the wrong role/play, what playbook did you run to create the rocky 8 image? was it using the docker image or the base image?

aphmschonewille commented 3 months ago

On a freshly installed Redhat 9 controller i was able to build a Rocky 8 compute image based on both base and docker. I was not able to reproduce the problem reported. Could you please show me your playbook (compute-xxxxx.yml) that you ran during the issue?

mulderij commented 3 months ago

I run: ansible-playbook compute-default.yml --extra-vars '@/backup/config/rocky8-vars.yml' --skip-tags='ood,environment-modules,openhpc,docker'

compute.yml is the current commit 0a84779 rocky8-vars.yml contains:

image_name: base-rocky8
alternative_distribution: Rocky-8

# slurm-slurmd toegevoegd
slurm_packages:
  - munge-libs
  - munge
  - slurm
  - slurm-slurmd
  - slurm-libs
  - slurm-contribs
  - slurm-devel
  - slurm-openlava
  - slurm-pam_slurm
  - slurm-perlapi
  - slurm-slurmdbd
  - slurm-slurmctld
  - slurm-torque

I'm not sure why I excluded environment-modules and when, but I think that was because we will be using our own stack (not using openHPC)