Closed timeu closed 2 years ago
Let's do cosmetics and move metrics to job_data https://github.com/CLIP-HPC/goslmailer/blob/96a78280e8d93896a25572cc609c07fff9252c30/internal/slurmjob/sacct.go#L13 https://github.com/CLIP-HPC/goslmailer/blob/96a78280e8d93896a25572cc609c07fff9252c30/internal/slurmjob/sacct.go#L41
here, so it's more readable https://github.com/CLIP-HPC/goslmailer/blob/more_e2e/internal/slurmjob/job_data.go
Also perhaps a line here with updated template guide https://github.com/CLIP-HPC/goslmailer/blob/more_e2e/templates/README.md
@tdido Hey, we're wrapping things up in this PR, more-less it's ready to go, some cosmetics will happen but it's functional now.
Would you be interested in helping us to test-drive it before merge?
If you find time, just checkout this branch and run make build
(skipping tests until we write up in the README the requirements.
For sure, I'll try it out and let you know.
OK, it's working!
Here's the output I'm getting:
<b>Job 219922 Ended</b>
<i>Created Tue, 31 May 2022 12:40:59 UTC</i>
<pre>------------------------------
Job Name : wrap
Job ID : 219922
User :
Partition :
Nodes Used :
Cores : 2
Job state : COMPLETED
Exit Code :
Submit : 2022-05-31T12:40:56
Start : 2022-05-31T12:40:56
End : 2022-05-31T12:40:57
Res. Walltime : 02:00:00
Used Walltime :
Used CPU time : 00:00.003
% User (Comp) : 33.33%
% System (I/O) : 33.33%
Memory Requested : 4.2 GB
Max Memory Used : 1.2 MB
Max Disk Write : 0 B
Max Disk Read : 0 B
------------------------------</pre>
<b>- TIP: Please consider lowering the ammount of requested memory in the future, your job has consumed less then half of the requested memory.</b>
<b>- TIP: Please consider lowering the amount of requested CPU cores in the future, your job has consumed less than half of requested CPU cores</b>
<b>- TIP: Your job was submitted with a walltime of 02:00:00 and finished in less half of the time, consider reducing the walltime and submit it to LONG QOS</b>
The only things of note is that I can't get the "User", "Partition", "Nodes used", and "Used Walltime" fields to populate (even if using the -p
and -w
arguments to sbatch
).
@tdido : For User, Partition and Nodes can you try to use this template instead the default one: https://github.com/CLIP-HPC/goslmailer/blob/more_e2e/test_e2e/cases/test_05/conf/adaptive_card_template.json The used walltime should work tough. Need to check why
@tdido : I forgot to replace the Used Walltime in the template with the one from the sacctmetrics struct: https://github.com/CLIP-HPC/goslmailer/blob/more_e2e/test_e2e/cases/test_05/conf/adaptive_card_template.json#L141
@tdido : Also I see that your email doesn't render the mail as HTML. For mutt you need to drop following config into /etc/Muttrc.local
:
# Local configuration for Mutt.
set content_type="text/html"
Instead of .Job.SlurmEnvironment.SLURM_JOB_USER
to get the user, change the template to use the .Job.JobStats.User
variables from SacctMetrics (since SlurmEnvironment in older version will contain only the jobid/arrayid vars, rest will come from jobstats (which works same in all versions):
Cheers lads, I had forgotten about the templating concept :P All looking great now. Thanks!
Great, then we wrap this pr up, merge and publish a new release tomorrow. Thanks for the help :+1: :1st_place_medal:
In oder SLURM version (< 21.08.x) the mail program is executed without setting any SLURM job environment variables (#4). We fallback to parsing the subject line that is passed to the mail program to retrieve jobid and other information such as job state and mail type. Additionally the function for retrieving job related information via sacct and sstat now properly return error messages, if the call fails. This fixes #7
Additonal end2end tests are added to test the above fixes.