charmed-hpc / slurm-charms

Juju charms for automating the Day 0 to Day 2 operations of the Slurm workload manager ⚖️🐧
Apache License 2.0
1 stars 5 forks source link

feat(all): implement `slurm_ops` charm library #35

Closed NucciTheBoss closed 1 week ago

NucciTheBoss commented 2 weeks ago

This PR implements the slurm_ops and is_container charm libraries to provide the Slurm operations logic to all the Slurm charms.

Key features are that there's now one standard interface for interacting with the Slurm services, and we have an easy way to drive munge, jwt tokens, prometheus, etc, without requiring duplication of logic between charms. slurm_ops also provides back-end implementations for both the Slurm deb and Slurm snap so that we can easily switch between the chosen format.

Related issues

Misc.

I came across a few issues that were fixed as part of the work to integrate slurm_ops into the charms:

  1. The user-supplied SlurmctldParameters was being overwritten by CHARM_MAINTAINED_SLURM_CONF_PARAMETERS when writing out the slurm.conf file.
  2. The user-supplied NHC configuration was not connected to anything. Even if a custom configuration was supplied by the user, the handler would just copy the template stored on the charm.
  3. Properly adjust ulimit rules for Slurm so that we can have more open files on the system.
NucciTheBoss commented 2 weeks ago

Hmm... seems like the CI hit a timeout. Will try running locally to see what the issues are :shipit:

NucciTheBoss commented 2 weeks ago

Upon further investigation, it seems like the current issue with the CI is the time it takes to build the cryptography package using the rustc compiler. My Framework sounded like a jet-engine when building all four Slurm charms at the same time, so I'm expecting that we might need to swap cryptography back to pycryptodome, or we add an additional build step for the charms.

jamesbeedy commented 2 weeks ago

This looks awesome! Nice work!

NucciTheBoss commented 1 week ago

@jedel1043 only took ~45 minutes for the integration tests to run, but this PR is R4R :partying_face:

Still looking into charmcraftcache for how that will work with our monorepo...