CLIP-HPC / goslmailer

GoSlurmMailer - drop in replacement for default slurm MailProg. Delivers slurm job messages to various destinations.
40 stars 6 forks source link

tgslurmbot not working (Reposting from 'discussions' as perhaps it's more appropriate here) #31

Closed hariseldon99 closed 1 year ago

hariseldon99 commented 1 year ago

Hi,

Using version 2.7.1 from releases.

Trying to test tgslurmbot in this fake docker cluster before putting it in my real one, but I am encountering issues.

Mailprog in slurm.conf is set:

# grep -B 2 -A 2 -i mailprog /etc/slurm/slurm.conf
#
# COMMUNICATION
MailProg=/usr/local/bin/goslmailer
# TIMERS
SlurmctldTimeout=300

Here is my config for tgslurmbot.conf/goslmailer.conf (same file, symlinked)

{
  "logfile": "/tmp/goslmailer.log",
  "debugconfig": true,
  "binpaths": {
    "sacct":"/usr/bin/sacct",
    "sstat":"/usr/bin/sstat"
  },
  "defaultconnector": "telegram",
  "connectors": {
    "telegram": {
      "name": "testbot",
      "url": "",
      "token": "Phake:OAUTH:Placeholder",
      "renderToFile": "no",
      "spoolDir": "/tmp/telegramgobs",
      "messageTemplate": "/etc/slurm/telegramTemplate.html",
      "useLookup": "no",
      "format": "HTML"
    }
  },
  "qosmap": {
    "elevated": 3600,
    "normal": 28800
  }
}

Here is the bot in telegram, seemingly working when I sent /start to it after running

# tgslurmbot &

Screenshot from 2023-02-25 23-07-45

I submitted this test code to slurm with sbatch:

#!/bin/bash

#SBATCH --job-name=hello_mpi
#This sets the name of the job

#SBATCH --ntasks=1
#This sets the number of processes to 4. Change if needed

#SBATCH --cpus-per-task=1

#SBATCH --time=00:05:00

#SBATCH --qos=normal

#SBATCH --mail-type=ALL
#SBATCH --mail-user=telegram:5545394160

mpirun -np ${SLURM_NTASKS} ./hello_mpi

Code ran correctly, as evidenced by this output

Hello world from processor c1, rank 0 out of 1 processors

But no response from the bot. Nada.

Slurm version is:

# sinfo --version
slurm 21.08.6
pja237 commented 1 year ago

Hey, could you take a look/share here the log file and see if there is anything reported there? Be careful if you're using debug config not to give us too much info :)

hariseldon99 commented 1 year ago

Here is the output throughout the execution of a slurm batch job:

sh-4.4# cat /tmp/goslmailer.log 
tgslurmbot:2023/02/26 08:48:58.327668 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 08:48:58.327705 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 08:48:58.327716 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 08:48:58.327723 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 08:48:58.327729 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 08:48:58.327736 tgslurmbot.go:58: Starting: "testbot"
pja237 commented 1 year ago

Let's try the following:

Initial thought is that slurm is not invoking MailProg (goslmailer) for some reason, so let's try to see if that is true, and why.

pja237 commented 1 year ago

This you might see in slurmctld.log on failure:

Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: slurmctld: error: MailProg returned error, it's output was '2023/02/26 04:58:24 Initializing connector: discord
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: mailto 
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: matrix
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: mattermost
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: msteams                    
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: slack                     
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: 2023/02/26 04:58:24 Initializing connector: telegram
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: panic: runtime error: invalid memory address or nil pointer dereference
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4e8729]
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: goroutine 1 [running]:                                                                          
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: log.(*Logger).Output(0x0, 0x20, {0xc000083140, 0x51})               
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]:         /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:165 +0x89
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: log.(*Logger).Fatalf(0xc0000c5050, {0xddcfb6, 0xdb25af}, {0xc00033ff50, 0x0, 0xc00004c6e0})
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]:         /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:210 +0x4c
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: main.main()                         
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]:         /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5
Feb 26 04:58:24 ctl-0.local.lan slurmctld[867]: '    

Make sure that:

The instance above was caused by log file owned by root and not writable by slurm user (0644 mode). I will do a PR to fix this panic.

pja237 commented 1 year ago

In your case, if you've run the tgslurmbot binary as root user first, it might have created the log file root:root without +w for others, so later slurm user with which goslmailer is run might not have been able to write to it and it would have panicked?

hariseldon99 commented 1 year ago

Many thanks for your attention.

So I fired up the old docker-compose and tailed the slurmctld.log as I submitted a generic MPI job. Sure enough, errors galore!

I redacted the debug messages to not clutter the post. The full logs are here on pastebin.


# scontrol show config|grep -i mail
MailDomain              = (null)
MailProg                = /usr/local/bin/goslmailer
sh-4.4# tail -f /var/log/slurm/slurmctld.log 

[2023-02-26T15:22:23.706] _slurm_rpc_submit_batch_job: JobId=155 InitPrio=4294901754 usec=357

[2023-02-26T15:22:27.302] error: MailProg returned error, it's output was '2023/02/26 15:22:26 Initializing connector: discord
2023/02/26 15:22:26 Initializing connector: mailto
2023/02/26 15:22:26 Initializing connector: matrix
2023/02/26 15:22:27 Initializing connector: mattermost
2023/02/26 15:22:27 Initializing connector: msteams
2023/02/26 15:22:27 Initializing connector: slack
2023/02/26 15:22:27 Initializing connector: telegram
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4e8729]

goroutine 1 [running]:
log.(*Logger).Output(0x0, 0x20, {0xc00010e120, 0x51})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:165 +0x89
log.(*Logger).Fatalf(0xc00016c180, {0xddcfb6, 0xdb25af}, {0xc00063ff50, 0xc000100570, 0x40ecc7})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:210 +0x4c
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5
'

[2023-02-26T15:22:36.166] _job_complete: JobId=155 WEXITSTATUS 0

[2023-02-26T15:22:36.418] _job_complete: JobId=155 done

[2023-02-26T15:22:36.676] error: MailProg returned error, it's output was '2023/02/26 15:22:36 Initializing connector: discord
2023/02/26 15:22:36 Initializing connector: mailto
2023/02/26 15:22:36 Initializing connector: matrix
2023/02/26 15:22:36 Initializing connector: mattermost
2023/02/26 15:22:36 Initializing connector: msteams
2023/02/26 15:22:36 Initializing connector: slack
2023/02/26 15:22:36 Initializing connector: telegram
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4e8729]

goroutine 1 [running]:
log.(*Logger).Output(0x0, 0x20, {0xc00003d1a0, 0x51})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:165 +0x89
log.(*Logger).Fatalf(0xc000037110, {0xddcfb6, 0xdb25af}, {0xc0003bff50, 0xc00005d170, 0x40ecc7})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:210 +0x4c
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5
'
pja237 commented 1 year ago

/home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5

This looks like you're hitting exactly what i have described above.

Do this:

  1. Separate the configuration files into two separate ones goslmailer.conf and tgslurmbot.conf (do not symlink them)
  2. Change the logfile lines to point to two separate files, e.g. "logfile": "/tmp/goslmailer.log", & "logfile": "/tmp/tgslurmbot.log",

Then try it out and let me know how that worked.

hariseldon99 commented 1 year ago

Hi,

Update to update: Turns out this was trivial. wgetting the markdown example from github and replavcing the malformatted html with that one solved the issue.

Thanks for the insight. I think you can close this issue now unless there is anything else that needs to be addressed.

Old Update

I reworked the log file permissions issue by adding slurm user to root group and g+w on the log files. It seems to have solved the previous problem, but has highlighted a new one. Is the telegram template file misconfigured?

[2023-02-26T17:56:37.182] error: MailProg returned error, it's output was '2023/02/26 17:56:37 Initializing connector: discord
2023/02/26 17:56:37 Initializing connector: mailto
2023/02/26 17:56:37 Initializing connector: matrix
2023/02/26 17:56:37 Initializing connector: mattermost
2023/02/26 17:56:37 Initializing connector: msteams
2023/02/26 17:56:37 Initializing connector: slack
2023/02/26 17:56:37 Initializing connector: telegram
panic: template: /etc/slurm/telegramTemplate.html:798: function "className" not defined

goroutine 1 [running]:
html/template.Must(...)
    /opt/hostedtoolcache/go/1.17.13/x64/src/html/template/template.go:374
github.com/CLIP-HPC/goslmailer/internal/renderer.RenderTemplate({0xc0004e2200, 0xc0004eecf0}, {0xc0004d37cc, 0x4}, 0xc0004da700, {0x7fff8dc2adcf, 0xa}, 0xc0006bfcb0)
    /home/runner/work/goslmailer/goslmailer/internal/renderer/renderer.go:40 +0x55a
github.com/CLIP-HPC/goslmailer/connectors/telegram.(*Connector).SendMessage(0xc0004e6000, 0xc0004d0f00, 0x1, 0x8)
    /home/runner/work/goslmailer/goslmailer/connectors/telegram/telegram.go:81 +0x455
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:96 +0x557

Older Message

Yeah, so I split them. There are now two config files in /etc/slurm/

# cat /etc/slurm/tgslurmbot.conf 
{                                                   
  "logfile": "/tmp/tgslurmbot.log",                 
  "debugconfig": true,                              
  "binpaths": {                                     
    "sacct":"/usr/bin/sacct",
    "sstat":"/usr/bin/sstat"
  },
  "defaultconnector": "telegram",                    
  "connectors": {                                   
    "telegram": {
      "name": "testbot",                       
      "url": "",                                    
      "token": "everythingisstillfubar",         
      "renderToFile": "no",                         
      "spoolDir": "/tmp/telegramgobs",              
      "messageTemplate": "/etc/slurm/telegramTemplate.html",  
      "useLookup": "no",                            
      "format": "HTML"                        
    }
  },
  "qosmap": {              
    "elevated": 3600,
    "normal": 28800
  }
}
# cat /etc/slurm/goslmailer.conf 
{                                                   
  "logfile": "/tmp/goslmailer.log",                 
  "debugconfig": true,                              
  "binpaths": {                                     
    "sacct":"/usr/bin/sacct",
    "sstat":"/usr/bin/sstat"
  },
  "defaultconnector": "msteams",                    
  "connectors": {                                   
    "msteams": {                                    
      "name": "dev channel",                        
      "renderToFile": "yes",                        
      "spoolDir": "/tmp",                           
      "url": "https://msteams/webhook/url",         
      "adaptiveCardTemplate": "/path/template.json",
      "useLookup": "GECOS"                          
    },                                              
    "mailto": {
      "name": "original slurm mail functionality, extended.",
      "mailCmd": "/usr/bin/mutt",                        
      "mailCmdParams": "-s \"Job {{ .SlurmEnvironment.SLURM_JOB_ID }} ({{ .SlurmEnvironment.SLURM_JOB_NAME }}) {{ .SlurmEnvironment.SLURM_JOB_MAIL_TYPE }}\"",
      "mailTemplate": "/etc/slurm/mailTemplate.tmpl",    
      "mailFormat": "HTML",                              
      "allowList": ".+@(imp|imba.oeaw|gmi.oeaw).ac.at"  
    },
    "telegram": {
      "name": "telegram bot",                       
      "url": "",                                    
      "token": "everythingisstillfubar",         
      "renderToFile": "no",                         
      "spoolDir": "/tmp/telegramgobs",              
      "messageTemplate": "/etc/slurm/telegramTemplate.html",  
      "useLookup": "no",                            
      "format": "HTML"                        
    },
    "discord": {
      "name": "DiscoSlurmBot",                      
      "triggerString": "showmeslurm",               
      "token": "PasteBotTokenHere",                 
      "messageTemplate": "/path/to/template.md"     
    },
"mattermost": {
      "name": "MatTheSlurmBot",                    
      "serverUrl": "https://someSpaceName.cloud.mattermost.com",  
      "wsUrl": "wss://someSpaceName.cloud.mattermost.com",        
      "token": "PasteBotTokenHere",                               
      "triggerString": "showmeslurm",                             
      "messageTemplate" : "/path/to/mattermostTemplate.md"        
    },
    "matrix": {
      "username": "@myuser:matrix.org",
      "token": "syt_dGRpZG9ib3QXXXXXXXEyQMBEmvOVp_10Jm93",
      "homeserver": "matrix.org",
      "template": "/path/to/matrix_template.md"
    },
    "slack": {
      "token": "PasteSlackBotTokenHere",            
      "messageTemplate": "/path/to/template.md",    
      "renderToFile": "spool",                      
      "spoolDir": "/tmp"                            
    },
    "textfile": {                                   
      "path": "/tmp"                                
    }
  },
  "qosmap": {              
    "elevated": 3600,
    "normal": 28800
  }
}

Same errors in slurmctld.log

[2023-02-26T17:36:21.483] _slurm_rpc_submit_batch_job: JobId=158 InitPrio=4294901757 usec=325

[2023-02-26T17:36:22.340] error: MailProg returned error, it's output was '2023/02/26 17:36:22 Initializing connector: discord
2023/02/26 17:36:22 Initializing connector: mailto
2023/02/26 17:36:22 Initializing connector: matrix
2023/02/26 17:36:22 Initializing connector: mattermost
2023/02/26 17:36:22 Initializing connector: msteams
2023/02/26 17:36:22 Initializing connector: slack
2023/02/26 17:36:22 Initializing connector: telegram
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4e8729]

goroutine 1 [running]:
log.(*Logger).Output(0x0, 0x20, {0xc00003c060, 0x51})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:165 +0x89
log.(*Logger).Fatalf(0xc000036180, {0xddcfb6, 0xdb25af}, {0xc0006bff50, 0xc000090570, 0x40ecc7})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:210 +0x4c
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5
'

[2023-02-26T17:36:23.806] error: MailProg returned error, it's output was '2023/02/26 17:36:23 Initializing connector: discord
2023/02/26 17:36:23 Initializing connector: mailto
2023/02/26 17:36:23 Initializing connector: matrix
2023/02/26 17:36:23 Initializing connector: mattermost
2023/02/26 17:36:23 Initializing connector: msteams
2023/02/26 17:36:23 Initializing connector: slack
2023/02/26 17:36:23 Initializing connector: telegram
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4e8729]

goroutine 1 [running]:
log.(*Logger).Output(0x0, 0x20, {0xc00003c060, 0x51})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:165 +0x89
log.(*Logger).Fatalf(0xc000036180, {0xddcfb6, 0xdb25af}, {0xc00063ff50, 0xc000090570, 0x40ecc7})
    /opt/hostedtoolcache/go/1.17.13/x64/src/log/log.go:210 +0x4c
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:47 +0x1c5
'

logfiles:

# cat /tmp/goslmailer.log 
tgslurmbot:2023/02/26 08:48:58.327668 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 08:48:58.327705 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 08:48:58.327716 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 08:48:58.327723 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 08:48:58.327729 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 08:48:58.327736 tgslurmbot.go:58: Starting: "testbot"
tgslurmbot:2023/02/26 08:52:55.231591 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 08:52:55.231631 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 08:52:55.231641 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 08:52:55.231659 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 08:52:55.231663 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 08:52:55.231668 tgslurmbot.go:58: Starting: "testbot"
tgslurmbot:2023/02/26 08:53:15.423147 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 08:53:15.423182 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 08:53:15.423191 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 08:53:15.423195 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 08:53:15.423198 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 08:53:15.423202 tgslurmbot.go:58: Starting: "testbot"
tgslurmbot:2023/02/26 15:21:20.191616 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 15:21:20.193793 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 15:21:20.193957 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 15:21:20.193990 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 15:21:20.194019 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 15:21:20.194051 tgslurmbot.go:58: Starting: "testbot"
goslmailer:2023/02/26 17:35:50.057620 goslmailer.go:50: ======================== START OF RUN ==========================================
goslmailer:2023/02/26 17:35:50.057867 version.go:11: ----------------------------------------
goslmailer:2023/02/26 17:35:50.057877 version.go:12: Version: v2.7.1
goslmailer:2023/02/26 17:35:50.057881 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
goslmailer:2023/02/26 17:35:50.057884 version.go:14: ----------------------------------------
goslmailer:2023/02/26 17:35:50.057886 config.go:78: DUMP CONFIG:
goslmailer:2023/02/26 17:35:50.057983 config.go:79: CONFIGURATION: &config.ConfigContainer{DebugConfig:true, Logfile:"/tmp/goslmailer.log", Binpaths:map[string]string{"sacct":"/usr/bin/sacct", "sstat":"/usr/bin/sstat"}, DefaultConnector:"msteams", Connectors:map[string]map[string]string{"discord":map[string]string{"messageTemplate":"/path/to/template.md", "name":"DiscoSlurmBot", "token":"PasteBotTokenHere", "triggerString":"showmeslurm"}, "mailto":map[string]string{"allowList":".+@(imp|imba.oeaw|gmi.oeaw).ac.at", "mailCmd":"/usr/bin/mutt", "mailCmdParams":"-s \"Job {{ .SlurmEnvironment.SLURM_JOB_ID }} ({{ .SlurmEnvironment.SLURM_JOB_NAME }}) {{ .SlurmEnvironment.SLURM_JOB_MAIL_TYPE }}\"", "mailFormat":"HTML", "mailTemplate":"/etc/slurm/mailTemplate.tmpl", "name":"original slurm mail functionality, extended."}, "matrix":map[string]string{"homeserver":"matrix.org", "template":"/path/to/matrix_template.md", "token":"syt_dGRpZG9ib3QXXXXXXXEyQMBEmvOVp_10Jm93", "username":"@myuser:matrix.org"}, "mattermost":map[string]string{"messageTemplate":"/path/to/mattermostTemplate.md", "name":"MatTheSlurmBot", "serverUrl":"https://someSpaceName.cloud.mattermost.com", "token":"PasteBotTokenHere", "triggerString":"showmeslurm", "wsUrl":"wss://someSpaceName.cloud.mattermost.com"}, "msteams":map[string]string{"adaptiveCardTemplate":"/path/template.json", "name":"dev channel", "renderToFile":"yes", "spoolDir":"/tmp", "url":"https://msteams/webhook/url", "useLookup":"GECOS"}, "slack":map[string]string{"messageTemplate":"/path/to/template.md", "renderToFile":"spool", "spoolDir":"/tmp", "token":"PasteSlackBotTokenHere"}, "telegram":map[string]string{"format":"HTML", "messageTemplate":"/etc/slurm/telegramTemplate.html", "name":"telegram bot", "renderToFile":"no", "spoolDir":"/tmp/telegramgobs", "token":"5844013197:AAHQmCJFmLMD0y78g8dxGXhzxd5XwYLe4pw", "url":"", "useLookup":"no"}, "textfile":map[string]string{"path":"/tmp"}}, QosMap:map[string]uint64{"elevated":0xe10, "normal":0x7080}}
goslmailer:2023/02/26 17:35:50.058006 config.go:80: CONFIGURATION logfile: /tmp/goslmailer.log
goslmailer:2023/02/26 17:35:50.058014 config.go:81: --------------------------------------------------------------------------------
goslmailer:2023/02/26 17:35:50.058021 invocation_context.go:34: Parsing CMDLine:
goslmailer:2023/02/26 17:35:50.058025 invocation_context.go:35: CMD subject: "Default Blank Subject"
goslmailer:2023/02/26 17:35:50.058028 invocation_context.go:36: CMD others: []string{}
goslmailer:2023/02/26 17:35:50.058031 invocation_context.go:37: --------------------------------------------------------------------------------
goslmailer:2023/02/26 17:35:50.058034 invocation_context.go:41: DUMP RECEIVERS:
goslmailer:2023/02/26 17:35:50.058039 invocation_context.go:42: Receivers: main.Receivers(nil)
goslmailer:2023/02/26 17:35:50.058048 invocation_context.go:43: invocationContext: &main.invocationContext{CmdParams:main.CmdParams{Subject:"Default Blank Subject", Other:[]string{}}, Receivers:main.Receivers(nil)}
goslmailer:2023/02/26 17:35:50.058051 invocation_context.go:44: --------------------------------------------------------------------------------
goslmailer:2023/02/26 17:35:50.058056 getjobcontext.go:235: Start retrieving job stats
goslmailer:2023/02/26 17:35:50.058070 getjobcontext.go:236: slurmjob.SlurmEnvironment{SLURM_ARRAY_JOB_ID:"", SLURM_ARRAY_TASK_COUNT:"", SLURM_ARRAY_TASK_ID:"", SLURM_ARRAY_TASK_MAX:"", SLURM_ARRAY_TASK_MIN:"", SLURM_ARRAY_TASK_STEP:"", SLURM_CLUSTER_NAME:"", SLURM_JOB_ACCOUNT:"", SLURM_JOB_DERIVED_EC:"", SLURM_JOB_EXIT_CODE:"", SLURM_JOB_EXIT_CODE2:"", SLURM_JOB_EXIT_CODE_MAX:"", SLURM_JOB_EXIT_CODE_MIN:"", SLURM_JOB_GID:"", SLURM_JOB_GROUP:"", SLURM_JOBID:"", SLURM_JOB_ID:"", SLURM_JOB_MAIL_TYPE:"", SLURM_JOB_NAME:"", SLURM_JOB_NODELIST:"", SLURM_JOB_PARTITION:"", SLURM_JOB_QUEUED_TIME:"", SLURM_JOB_RUN_TIME:"", SLURM_JOB_STATE:"", SLURM_JOB_STDIN:"", SLURM_JOB_UID:"", SLURM_JOB_USER:"", SLURM_JOB_WORK_DIR:""}
goslmailer:2023/02/26 17:35:50.058886 goslmailer.go:71: Unable to retrieve job stats. Error: Invalid subject line: Default Blank Subject
# cat /tmp/tgslurmbot.log 
tgslurmbot:2023/02/26 17:25:02.858693 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 17:25:02.859473 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 17:25:02.860025 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 17:25:02.860029 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 17:25:02.860033 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 17:25:02.860038 tgslurmbot.go:58: Starting: "testbot"
tgslurmbot:2023/02/26 17:32:18.119490 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 17:32:18.119540 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 17:32:18.119554 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 17:32:18.119559 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 17:32:18.119564 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 17:32:18.119570 tgslurmbot.go:58: Starting: "testbot"
tgslurmbot:2023/02/26 17:33:36.081803 tgslurmbot.go:50: ======================= tgslurmbot start =======================================
tgslurmbot:2023/02/26 17:33:36.081842 version.go:11: ----------------------------------------
tgslurmbot:2023/02/26 17:33:36.081852 version.go:12: Version: v2.7.1
tgslurmbot:2023/02/26 17:33:36.081856 version.go:13: Build commit hash: d60a3ea6a0d1051bbcf6e2526d77a15904aa6581
tgslurmbot:2023/02/26 17:33:36.081864 version.go:14: ----------------------------------------
tgslurmbot:2023/02/26 17:33:36.081869 tgslurmbot.go:58: Starting: "testbot"

By the way, dunno if this is relevant, but the '/home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go' path referenced in the logs does not really exist.

pja237 commented 1 year ago

Ok, config files now look better.

goslmailer:2023/02/26 17:35:50.058014 config.go:81: --------------------------------------------------------------------------------
goslmailer:2023/02/26 17:35:50.058021 invocation_context.go:34: Parsing CMDLine:
goslmailer:2023/02/26 17:35:50.058025 invocation_context.go:35: CMD subject: "Default Blank Subject"
goslmailer:2023/02/26 17:35:50.058028 invocation_context.go:36: CMD others: []string{}
goslmailer:2023/02/26 17:35:50.058031 invocation_context.go:37: --------------------------------------------------------------------------------

This tells me you've just invoked goslmailer manually with no switches on command line. Try submitting: sbatch --mail-type=ALL --mail-user='telegram:YOURID' --wrap='sleep 60' And show me the log.

When slurm invokes goslmailer, it looks something like this:

goslmailer:2023/02/26 13:01:19.004375 config.go:81: --------------------------------------------------------------------------------                                                                              
goslmailer:2023/02/26 13:01:19.004498 invocation_context.go:34: Parsing CMDLine:
goslmailer:2023/02/26 13:01:19.004512 invocation_context.go:35: CMD subject: "Slurm Job_id=37 Name=wrap Ended, Run time 00:01:00, COMPLETED, ExitCode 0"                                                          
goslmailer:2023/02/26 13:01:19.004518 invocation_context.go:36: CMD others: []string{"telegram:XXX"}                                                                                                       
goslmailer:2023/02/26 13:01:19.004523 invocation_context.go:37: --------------------------------------------------------------------------------    
pja237 commented 1 year ago

Hey, glad to hear you've made it!

Just one question before i close this. Which template file did you use in the first place to get this error message? telegram.html from the release zip? Was it modified before deployment? Later i'll do a release that will address the logger panic and replace it with a more descriptive log message.

Update to update: Turns out this was trivial. wgetting the markdown example from github and replavcing the malformatted html with that one solved the issue.

Thanks for the insight. I think you can close this issue now unless there is anything else that needs to be addressed.

Old Update

I reworked the log file permissions issue by adding slurm user to root group and g+w on the log files. It seems to have solved the previous problem, but has highlighted a new one. Is the telegram template file misconfigured?

[2023-02-26T17:56:37.182] error: MailProg returned error, it's output was '2023/02/26 17:56:37 Initializing connector: discord
2023/02/26 17:56:37 Initializing connector: mailto
2023/02/26 17:56:37 Initializing connector: matrix
2023/02/26 17:56:37 Initializing connector: mattermost
2023/02/26 17:56:37 Initializing connector: msteams
2023/02/26 17:56:37 Initializing connector: slack
2023/02/26 17:56:37 Initializing connector: telegram
panic: template: /etc/slurm/telegramTemplate.html:798: function "className" not defined

goroutine 1 [running]:
html/template.Must(...)
  /opt/hostedtoolcache/go/1.17.13/x64/src/html/template/template.go:374
github.com/CLIP-HPC/goslmailer/internal/renderer.RenderTemplate({0xc0004e2200, 0xc0004eecf0}, {0xc0004d37cc, 0x4}, 0xc0004da700, {0x7fff8dc2adcf, 0xa}, 0xc0006bfcb0)
  /home/runner/work/goslmailer/goslmailer/internal/renderer/renderer.go:40 +0x55a
github.com/CLIP-HPC/goslmailer/connectors/telegram.(*Connector).SendMessage(0xc0004e6000, 0xc0004d0f00, 0x1, 0x8)
  /home/runner/work/goslmailer/goslmailer/connectors/telegram/telegram.go:81 +0x455
main.main()
  /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:96 +0x557
hariseldon99 commented 1 year ago

Hey, glad to hear you've made it!

Just one question before i close this. Which template file did you use in the first place to get this error message? telegram.html from the release zip? Was it modified before deployment? Later i'll do a release that will address the logger panic and replace it with a more descriptive log message.

Yes. It was from the html file in the release zip, and no, I did not change anything.

Update to update: Turns out this was trivial. wgetting the markdown example from github and replavcing the malformatted html with that one solved the issue. Thanks for the insight. I think you can close this issue now unless there is anything else that needs to be addressed.

Old Update

I reworked the log file permissions issue by adding slurm user to root group and g+w on the log files. It seems to have solved the previous problem, but has highlighted a new one. Is the telegram template file misconfigured?

[2023-02-26T17:56:37.182] error: MailProg returned error, it's output was '2023/02/26 17:56:37 Initializing connector: discord
2023/02/26 17:56:37 Initializing connector: mailto
2023/02/26 17:56:37 Initializing connector: matrix
2023/02/26 17:56:37 Initializing connector: mattermost
2023/02/26 17:56:37 Initializing connector: msteams
2023/02/26 17:56:37 Initializing connector: slack
2023/02/26 17:56:37 Initializing connector: telegram
panic: template: /etc/slurm/telegramTemplate.html:798: function "className" not defined

goroutine 1 [running]:
html/template.Must(...)
    /opt/hostedtoolcache/go/1.17.13/x64/src/html/template/template.go:374
github.com/CLIP-HPC/goslmailer/internal/renderer.RenderTemplate({0xc0004e2200, 0xc0004eecf0}, {0xc0004d37cc, 0x4}, 0xc0004da700, {0x7fff8dc2adcf, 0xa}, 0xc0006bfcb0)
    /home/runner/work/goslmailer/goslmailer/internal/renderer/renderer.go:40 +0x55a
github.com/CLIP-HPC/goslmailer/connectors/telegram.(*Connector).SendMessage(0xc0004e6000, 0xc0004d0f00, 0x1, 0x8)
    /home/runner/work/goslmailer/goslmailer/connectors/telegram/telegram.go:81 +0x455
main.main()
    /home/runner/work/goslmailer/goslmailer/cmd/goslmailer/goslmailer.go:96 +0x557
pja237 commented 1 year ago

As part of discussion here: #33 i've tried out the telegram HTML template from release zip and it worked ok. Will close this now since the rest of the issue was solved.