clustervision / trinityX

TrinityX is the new generation of ClusterVision's open-source HPC, A/I and cloudbursting platform. It is designed from the ground up to provide all services required in a modern HPC and A/I system, and to allow full customization of the installation.
GNU General Public License v3.0
68 stars 37 forks source link

slurm / ohpc vs epel #406

Closed msteggink closed 6 months ago

msteggink commented 7 months ago

dnf search on Rocky 9.3:

# dnf search slurm

slurm-contribs.x86_64 : Perl tools to print Slurm job state information
slurm-contribs-ohpc.x86_64 : Perl tool to print Slurm job state information
slurm-devel.x86_64 : Development package for Slurm
slurm-devel-ohpc.x86_64 : Development package for Slurm
slurm-doc.x86_64 : Slurm documentation
slurm-example-configs-ohpc.x86_64 : Example config files for Slurm
slurm-gui.x86_64 : Slurm gui and visual tools
slurm-libpmi-ohpc.x86_64 : Slurm\'s implementation of the pmi libraries
slurm-libs.x86_64 : Slurm shared libraries
slurm-nss_slurm.x86_64 : NSS plugin for slurm
slurm-ohpc.x86_64 : Slurm Workload Manager
slurm-ohpc-slurmrestd.x86_64 : Slurm REST API translator
slurm-openlava.x86_64 : Openlava/LSF wrappers for transition from OpenLava/LSF to Slurm
slurm-openlava-ohpc.x86_64 : openlava/LSF wrappers for transition from OpenLava/LSF to Slurm
slurm-pam_slurm.x86_64 : PAM module for restricting access to compute nodes via Slurm
slurm-pam_slurm-ohpc.x86_64 : PAM module for restricting access to compute nodes via Slurm
slurm-perlapi.x86_64 : Perl API to Slurm
slurm-perlapi-ohpc.x86_64 : Perl API to Slurm
slurm-rrdtool.x86_64 : Slurm rrdtool external sensor plugin
slurm-slurmctld.x86_64 : Slurm controller daemon
slurm-slurmctld-ohpc.x86_64 : Slurm controller daemon
slurm-slurmd.x86_64 : Slurm compute node daemon
slurm-slurmd-ohpc.x86_64 : Slurm compute node daemon
slurm-slurmdbd.x86_64 : Slurm database daemon
slurm-slurmdbd-ohpc.x86_64 : Slurm database daemon
slurm-slurmrestd.x86_64 : Slurm REST API deamon
slurm-sview-ohpc.x86_64 : Graphical user interface to view and modify Slurm state
slurm-torque.x86_64 : Torque/PBS wrappers for transition from Torque/PBS to Slurm
slurm-torque-ohpc.x86_64 : Torque/PBS wrappers for transition from Torque/PBS to Slurm

Note the "duplicate" packages, from the EPEL and the OpenHPC (preferred).

This could lead to installation of the wrong version (note it comes from epel)

# dnf install slurm-slurmrestd
Last metadata expiration check: 0:27:15 ago on Tue 23 Apr 2024 03:33:25 PM MDT.
Dependencies resolved.
=====================================================================================================================================
 Package                             Architecture              Version                            Repository                    Size
=====================================================================================================================================
Installing:
 slurm-slurmrestd                    x86_64                    22.05.9-1.el9                      epel                         156 k
Installing dependencies:
 hdf5                                x86_64                    1.12.1-7.el9.1                     epel                         2.2 M
 http-parser                         x86_64                    2.9.4-6.el9                        appstream                     37 k
 libaec                              x86_64                    1.0.6-1.el9                        epel                          41 k
 libjwt                              x86_64                    1.12.1-11.el9                      epel                          29 k
 pmix                                x86_64                    3.2.3-3.el9                        appstream                    498 k
 slurm                               x86_64                    22.05.9-1.el9                      epel                         1.9 M
 slurm-libs                          x86_64                    22.05.9-1.el9                      epel                         1.2 M

We want to see only the openhpc:

# dnf install slurm-ohpc-slurmrestd.x86_64
Last metadata expiration check: 0:28:02 ago on Tue 23 Apr 2024 03:33:25 PM MDT.
Dependencies resolved.
=====================================================================================================================================
 Package                              Architecture          Version                                Repository                   Size
=====================================================================================================================================
Installing:
 slurm-ohpc-slurmrestd                x86_64                22.05.8-300.ohpc.4.6                   openhpc-base                145 k
Installing dependencies:
 http-parser                          x86_64                2.9.4-6.el9                            appstream                    37 k
aphmschonewille commented 7 months ago

we will look into disabling the epel repo during the slurm install. success rate depends a bit on how other dependencies are resolved.

aphmschonewille commented 6 months ago

it seems that all ohpc packages for slurm trails with -ohpc, which makes it easier to distinguish. as such we have to revise the slurm role to just use the ohpc packages, instead of a mix currently. Then disabling the repo is not needed.

aphmschonewille commented 6 months ago

Earlier comment confirmed. slurm_packages_ohpc only contains packages with the trailing -ohpc. The role only installs slurm from epel if openhpc is disabled:

- name: Install slurm packages
  yum:
    name: '{{ slurm_packages }}'
    state: present
  tags: install-only
  when: enable_openhpc == false

- name: ensure legacy slurm rpms are not installed before configuring ohpc versions
  yum:
    name: '{{ slurm_packages }}'
    state: removed
  tags: install-only
  when: enable_openhpc == true

....

- name: Install ohpc slurm packages
  yum:
    name: '{{ slurm_packages_ohpc }}'
    state: present
  tags: install-only
  when: enable_openhpc == true

There is no risk installing wrong packages during TrinityX installation.

typical package set after installing on the controller:

i close the ticket.