charmed-hpc / slurmdbd-operator

[moved]: Juju charm for slurmdbd, the database daemon of Slurm
https://github.com/charmed-hpc/slurm-charms/tree/main/charms/slurmdbd
Apache License 2.0
1 stars 7 forks source link

Set workload version to version of SLURM service rather than `git describe ... > version` #5

Closed NucciTheBoss closed 4 months ago

NucciTheBoss commented 1 year ago

Enhancement Proposal

Overview

Hello there folks! I am writing this enhancement proposal because I would like to propose a change to how the slurm charms set their workload version given the issues that we encountered last week. Rather than using git describe ... to create a version file and then using that version file to set the workload version during install, I propose that we set the workload version of the charm to the version of the SLURM service running on the Juju unit.

Reason 1

Per John Meinel (Director of Engineer for Juju) guidance, the workload version of the charm should be used to tell the human operator what version of the service you are running, not the version of the charm's source code:

image

The version file seems to be mostly for leaving a "signature" of what version of the charm that you are running. Best practice seems to be that the workload version is the version of the service that has been charmed. In the case of SLURM, setting the workload version to the version of the underlying SLURM service makes it more clear to the human operator which version of SLURM they are deploying.

Reason 2

Since the version file is not automatically created by charmcraft when packing the SLURM charms and it is required to successfully install SLURM during InstallEvent handling, the version file can cause the SLURM charms to break for "self-packing" administrators who are not familiar with the internals of the SLURM charms. If we set the workload version to version of the SLURM service running on the unit, we do not have worry about whether or not the version file was packed too.

Implementation

There are two possible ways we could implement setting the workload version of the units using the SLURM service version rather than the version file:

  1. Use a Python subprocess with {slurmctld,slurmd,slurmdbd,slurmrestd} -V to retrieve the version of the SLURM service.
  2. Use the version attribute provided by the apt charm library from operator_libs_linux.

One thing to note about option 1 is that the workload version would have to be set right before we start the SLURM service. {slurmctld,slurmd,slurmdbd,slurmrestd} -V seems to only work if everything is in order with the respective configuration file.

Closing comments

Let me know what your thoughts are on changing how we set the workload version of the Juju units for the SLURM charms. I think this will help resolve issues with forgetting to pack the version file and prevent it from being the reason why SLURM fails to install on freshly deployed units.

jaimesouza commented 1 year ago

Hello @NucciTheBoss! Thank you for that! I see no problem with taking the approach you suggested. What do you think @jamesbeedy?

jamesbeedy commented 1 year ago

@NucciTheBoss thank you for providing this thorough explanation. I vote for option #2 due to the fact that the version will be available immediately after install and will not depend on whether the service can start up or not.

NucciTheBoss commented 1 year ago

Thanks for getting back to me folks! Sounds good James - I prefer option 2 as well. Currently I am working on the migration of slurmdbd to MySQL Innodb and MySQL charms, but I should be able to get to this starting tomorrow and potentially have something to submit Wednesday before our meeting.

jamesbeedy commented 1 year ago

Hey, @NucciTheBoss and @dvdgomez 👋

We are trying to get these charms released to edge with a version file for now (as they are currently broken in edge). I will advocate using the dnf/apt libraries to get the slurm version and set that as the workload version when the dnf lib is ready.

NucciTheBoss commented 1 year ago

That is fine by me for the moment. I saw that you added an extra step to the pipeline anyways :smile:

We should hopefully have something more concrete after I do some more work into supporting the different bases. Control plane we can get away with just using Ubuntu, but with the new DNF library I proposed in https://github.com/canonical/operator-libs-linux/pull/75 we should be able to get it where using apt/dnf will work for operators like slurmd.