CLIP-HPC / goslmailer

GoSlurmMailer - drop in replacement for default slurm MailProg. Delivers slurm job messages to various destinations.
MIT License
42 stars 6 forks source link

Troubles with gobler when implementing connector #20

Open hermannschwaerzlerUIBK opened 2 years ago

hermannschwaerzlerUIBK commented 2 years ago

I am trying to implement a connector that "abuses" goslmailer to write a summary into a file in the work-dir of a job.

I wrote it such that it uses spooling. The first part (writing some .gob file to the spooling directory works just fine. But running "gobler -c ..." gives me this output:

2022/08/31 14:51:30 Initializing connector: discord
2022/08/31 14:51:30 Initializing connector: mailto
2022/08/31 14:51:30 Initializing connector: matrix
2022/08/31 14:51:30 Initializing connector: msteams
2022/08/31 14:51:30 Initializing connector: telegram
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x92c568]

goroutine 8 [running]:
main.(*sender).SenderWorker(0xc0003d68c0, 0x0, 0x0, 0x0, 0x0)
        [...]/goslmailer/cmd/gobler/sender.go:63 +0x3e8
created by main.(*conMon).SpinUp
        [...]/goslmailer/cmd/gobler/conmon.go:151 +0x576

Any hints or ideas on how to debug this?

hermannschwaerzlerUIBK commented 2 years ago

This is my code:

summary_connector.tar.gz

hermannschwaerzlerUIBK commented 2 years ago

Sorry for the hassle! I was able to fix the problem. In hindsight it was an obvious error: I had not added a

_ "github.com/CLIP-HPC/goslmailer/connectors/summary"

line in the import section of cmd/gobler/gobler.go :-(

After adding that everything works as expected!

pja237 commented 2 years ago

Hey Hermann, sorry for this late answer, but somehow you've managed to hit the one week when the whole team overlapped out for vacation :disappointed: We're back now and if you have anything to ask (no hassle at all), we'll answer, much faster. Glad that you've managed to figure it out, the idea for the connector also sounds interesting, would be great if you PR it when it's finished, until then, if we can assist you anyhow with development, let us know.

How do you plan to get the workdir? Passed in via --mail-user=summary:/path/ manually or through slurm job env, we have the env for newer slurm versions, but we could extend it for older ones as well if you'd need it:

https://github.com/CLIP-HPC/goslmailer/blob/f8118e6997edf6f5da11a7dc554890e15501f642/internal/slurmjob/job_data.go#L39

hermannschwaerzlerUIBK commented 2 years ago

Dear @pja237,

thanks for getting in touch. In order to get the work-dir I do this in the configuration:

  "connectors": {
    "summary": {
      "path": "{{ .SlurmEnvironment.SLURM_JOB_WORK_DIR }}/slurm-{{ .SlurmEnvironment.SLURM_JOB_ID }}.summary",
[...]

We are using Slurm version 22.05.x and with that this works perfectly.

I am using spooling in goslmailer and then do the actual work in gobler because goslmailer runs as user slurm which is not privileged to write in everyones working-directories. :-) But the gobler-service can run as root and thus write those files.

I happen to have a few questions: In a setup like mine in goslmailer.conf for my connector I need only these lines, right?

     "summary": {
      "spoolDir": "/var/spool/goslmailer"
     }

As the only thing I am doing in goslmailer is to spool the "task". And all the other configuration-settings go to gobler.conf as they are needed only there, right?

To illustrate how I am doing things, here is my current state of SendMessage():

func (c *Connector) SendMessage(mp *message.MessagePack, useSpool bool, l *log.Logger) error {
        var (
                e      error = nil
                path = bytes.Buffer{}
                body = bytes.Buffer{}
        )

        if useSpool {
                err := spool.DepositToSpool(c.spoolDir, mp)
                if err != nil {
                        l.Printf("DepositToSpool Failed!\n")
                        return err
                }
        } else {
                // render destination path
                p := template.Must(template.New("path").Parse(c.path))
                e = p.Execute(&path, mp.JobContext)
                if e != nil {
                        return e
                }

                // render body
                err := renderer.RenderTemplate(c.template, "text", mp.JobContext, mp.TargetUser, &body)
                if err != nil {
                        return err
                }

                // save body to file
                err = os.WriteFile(path.String(), body.Bytes(), 0644)
                if err != nil {
                        return err
                }
                // chown that file to uid and gid of the destination directory
                splitPath := strings.Split(path.String(), "/")
                workDir := strings.Join(splitPath[0:(len(splitPath) - 1)], "/")
                fileInfo, err := os.Stat(workDir)
                stat := fileInfo.Sys().(*syscall.Stat_t)
                UID := int(stat.Uid)
                GID := int(stat.Gid)
                os.Chown(path.String(), UID, GID)
        }
        return e
}

There might be an easier solution for the last part (chowning the file), as it is (or at least looks) a bit tedious.

Regards, Hermann

pja237 commented 2 years ago

Evening Hermann,

here are my toughts... :)

  "connectors": {
    "summary": {
      "path": "{{ .SlurmEnvironment.SLURM_JOB_WORK_DIR }}/slurm-{{ .SlurmEnvironment.SLURM_JOB_ID }}.summary",
[...]

This is great, outside of mail connectors command line, didn't think of a template use like this :)

I am using spooling in goslmailer and then do the actual work in gobler because goslmailer runs as user slurm which is not privileged to write in everyones working-directories. :-) But the gobler-service can run as root and thus write those files.

Great (ab)use. :+1:

I happen to have a few questions: In a setup like mine in goslmailer.conf for my connector I need only these lines, right?

     "summary": {
      "spoolDir": "/var/spool/goslmailer"
     }

Yes, correct.

As the only thing I am doing in goslmailer is to spool the "task". And all the other configuration-settings go to gobler.conf as they are needed only there, right?

Exactly :+1:


func (c *Connector) SendMessage(mp *message.MessagePack, useSpool bool, l *log.Logger) error {

[snip]

                // chown that file to uid and gid of the destination directory
                splitPath := strings.Split(path.String(), "/")
                workDir := strings.Join(splitPath[0:(len(splitPath) - 1)], "/")
                fileInfo, err := os.Stat(workDir)
                stat := fileInfo.Sys().(*syscall.Stat_t)
                UID := int(stat.Uid)
                GID := int(stat.Gid)
                os.Chown(path.String(), UID, GID)

Just a thought on this bit, but assuming you want to give the summary to the user submitting the job, wouldn't it be safe to assume that the he is also the owner, or at heast has some permissions on the working directory (the one from: .SlurmEnvironment.SLURM_JOB_WORK_DIR). Then perhaps you can just do the https://pkg.go.dev/os#Chown directly to him, without the whole tedious bits of testing?

e.g.

// this is just pseudo from head, needs error handling and string->int bits sorted out
os.Chown(path.String(), strconv.Atoi(mp.SlurmEnvironment.SLURM_JOB_UID), same_for_GID)

On a sidenote, i remember you were also interested in the matrix connector, did you perhaps try it out yet?

best, Petar

hermannschwaerzlerUIBK commented 2 years ago

Hi Petar,

sorry for the long delay. Now it was me who was out of office for about a week. :-) Thank you for your helpful comments and support. Regarding the chown: Yes I guess it should be safe to simplify that part to

                UID, e = strconv.Atoi(mp.JobContext.SlurmEnvironment.SLURM_JOB_UID)
                GID, e = strconv.Atoi(mp.JobContext.SlurmEnvironment.SLURM_JOB_GID)
                os.Chown(path.String(), UID, GID)

(after having declared UID and GID as int further up). I tested it in my environment and it works. I will prepare a pull request soonish. I am planning to add a README.md file to the corresponding subdirectory of connectors to describe the necessary prerequisites for it (a spooling-directory, the necessity of running gobler as root and potentially a few lines in job_submit.lua to make it work automagically).

Regarding the matrix connector: no unfortunately I hasn't been able to test it, yet. We are in the middle of getting our new cluster to production and I had to focus my priorities...

Regards Hermann

pja237 commented 2 years ago

Hey Hermann, no worries, whenever you're ready with the PR, fire away. Feel free to add the job_submit lua as a usage example, that would also be great :) Good luck with the new cluster roll out.

best, Petar

pja237 commented 2 years ago

Hey Hermann, just checking up on this issue, i hope all is working well with the new cluster. Did you manage to put this code to good use in the end?