charmed-hpc / slurm-charms

Juju charms for automating the Day 0 to Day 2 operations of the Slurm workload manager ⚖️🐧
Apache License 2.0
1 stars 5 forks source link

error: Configured MailProg is invalid #30

Closed jamesbeedy closed 1 week ago

jamesbeedy commented 1 month ago

Bug Description

Slurmctld always starts with this error "error: Configured MailProg is invalid" ... possibly we need to take a look at the charm configured MailProg vs what is actually available on the system.

To Reproduce

deploy main

Environment

deploy main

Relevant log output

[2024-10-08T06:10:43.540] error: Configured MailProg is invalid

Additional context

No response

NucciTheBoss commented 1 month ago

Having a proper mail program integration is on our radar along with re-enabling controller high-availability. Looking at the sauce, looks like the configured mail program is mail.mailutils which is provided by the mailutils package: https://github.com/charmed-hpc/slurm-charms/blob/8dd8809d3db90051a98b80611cc4d28ab725aec4/charms/slurmctld/src/constants.py#L19

Seems like the problem could be one of two things:

  1. mailutils isn't installed on the slurmctld node.
  2. /usr/bin/mail.mailutils isn't considered a valid mail program by Slurm.
jamesbeedy commented 1 month ago

We can fix by installing mailutils for slurmctld.

$ apt-cache policy mailutils
mailutils:
  Installed: (none)
  Candidate: 1:3.14-1
  Version table:
     1:3.14-1 500
        500 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
sudo apt install mailutils -y

...

$ ls -larth /usr/bin/mail.mailutils
-rwxr-xr-x 1 root root 230K Jan 12  2022 /usr/bin/mail.mailutils
NucciTheBoss commented 1 month ago

This an easy addition to the slurm_ops charm library. We need to spend some time scoping out a charmed solution for the mail server (email is hard), but we should make Slurm happy for if someone manually sets up a mail server integration.

We can fix by installing mailutils for slurmctld

I love it when the solution to the problem is a simple fix :star_struck: