CLIP-HPC / goslmailer

GoSlurmMailer - drop in replacement for default slurm MailProg. Delivers slurm job messages to various destinations.
40 stars 6 forks source link

Add Matrix connector #13

Closed tdido closed 2 years ago

tdido commented 2 years ago

My interest in the project spawned from this post in slurm-users.

Here's a first attempt at implementing a connector to Matrix. It can send messages to any channel the provided username is a member of, and (currently) uses a markdown template which works well with most clients.

I'd really appreciate any comments from the authors regarding the general implementation, and we could then perhaps post in the slurm-users thread inviting people to test.

tdido commented 2 years ago

@timeu @pja237 Hello again. I'd very much appreciate your comments on this attempt.

timeu commented 2 years ago

@tdido : Thanks a lot of for the contribution. We are a bit busy with a different project this week, so we will do a proper review in the comming days. Here are some initial comments:

pja237 commented 2 years ago

Hey @tdido, nice to have you back in the game.

I'll try to explain how we did this with telegram in hope that it sets you in the right direction with matrix, since i'm supershort on time nowdays, i ask you for a bit of patience, might take some hours before i reply.

Telegram:

So, we built this tgslurmbot binary, which is a simple telegram bot, any telegram user can search for him and initiate a chat with him, also groupchats of users can add this bot to their chat. When that happens, the bot awakens on this event, looks up the chatID of this event and sends the following message back to that same chat: welcome to slurmbot, use this switch in your sbatch --mail-user=telegram:thisParticularChatID. That's how we give the information to the user on which sbatch submission switch to add, simply copy-paste this switch and voila. This chatID then comes back from users job to the telegram connector, and we direct the messages there.

Problem solved.

This is how you could attack this in matrix, with a similar approach:

  1. build a matrix bot, which when users/groups invite to chat awakens and sends the message back to that chat with the roomID, i've quickly ripped off some code from other people and put together a poc you can evolve and test (and make less bad) https://gist.github.com/pja237/04acc9e0131c134f0bf48ad704b82ae5 (try it out)
  2. then in the connector use the roomID coming from --mail-user switch (no need for lookups), example send: https://gist.github.com/pja237/6d97895bc606e5d4e2d0ad1fb2e58840

You can run and test it out stand-alone, register the bot users, fill out the consts, spin up bot, send him an invite and then observe. Try the bot to see if this concept will work, perhaps with a group chat too, add more users if you have, then invite bot to the chat. Then if this is a good path fwd. it'll need a lot of polishing.

Hope this helps you somehow on how to get the things you're missing? I'm happy to work on this to go forward, but sadly can't contribute much more these days (until the storm passes). Just drop any question, one of us will try to reply asap.

(no idea what just happened, this was posted twice, i've deleted one copy)

pja237 commented 2 years ago

So i'm guessing you tried this concept out and it's good?

Does this workflow with the bot have any limitations or it covers all required use cases?

I'll keep an eye out if you need any more assistance, that thing with cannibalized code and double messages you'll have to figure out why if you haven't already, so just ping us here and tnx for patience :+1:

tdido commented 2 years ago

So i'm guessing you tried this concept out and it's good?

Yes, Matrix doesn't really treat single and multi-user channels differently so any room ID works.

Does this workflow with the bot have any limitations or it covers all required use cases?

No limitations that I can think of, and if you don't want to use the bot you can just set a normal user in the config and it will be able to send messages to any channel such user is already a member of. So I'd say it's pretty flexible.

I'll keep an eye out if you need any more assistance, that thing with cannibalized code and double messages you'll have to figure out why if you haven't already, so just ping us here and tnx for patience

I appreciate your help. I'll keep at it and report any blockers here.

tdido commented 2 years ago

OK, this is in pretty good shape I think. There's three key things missing:

I've been looking at the e2e tests and I'm not very confident I understand how to go about adding one, so I'd appreciate any guidance regarding that whenever you guys have some time to spare.

Thanks!

pja237 commented 2 years ago

Looks simple and reasonable enough, i'll add some thoughts inline on the matrix bot code to consider while wrapping that up. Re. the three points you have, i can try to assist with some e2e thoughts.

  • [ ] Add e2e tests

To be honest, in this case i'm not sure how i'd do the test at this point in time (@timeu ideas?).

The idea of e2e is that we simulate the whole environment around gosl/gobl invocation and then observe the output. I'll take test_05 as an example, where we simulate slurm version <21.8 which doesn't have env vars (you know the story). So to simulate one run of the goslmailer we need:

  1. configuration files here
  2. slurm environment variables we do via a simple source of a shell script which sets them, or not here
  3. the output of slurm sacct and sstat we simulate again with simple shell scripts that dump the same output as real commands would here
  4. the test written in endly workflow system

The test.yaml describes where to copy those 1-3 things we need from above list and how to execute the goslmailer command using with the env. vars set/not.

https://github.com/CLIP-HPC/goslmailer/blob/85fe394e4c0170de8a66f3c699207bbe20f7a133/test_e2e/cases/test_05/test.yaml#L30-L41

Then depending on the test case we assert various things. In the test_05 we check for two things:

  1. that the rendered output file from goslmailer is equal to our golden image here https://github.com/CLIP-HPC/goslmailer/blob/85fe394e4c0170de8a66f3c699207bbe20f7a133/test_e2e/cases/test_05/test.yaml#L52

and:

  1. goslmailer threw this out in the run log: https://github.com/CLIP-HPC/goslmailer/blob/85fe394e4c0170de8a66f3c699207bbe20f7a133/test_e2e/cases/test_05/test.yaml#L57

So in this case, where the connector doesn't do debug rendering to file or supports spooling, it's really hard to simulate the other side, which would be the matrix server. Perhaps for now skip the e2e test unless you came to some idea now after this, or @timeu? We can always do it later.

  • [ ] Add docs

Yes please. Screenshots are welcome as well.

pja237 commented 2 years ago

Hey @tdido, are there any news on this pr? The times are a bit calmer for us now and if you're busy i could pitch in some code here. Just let us know how you stand with time and if you want we can sync up the work on telegram (same handle as on github).

pja237 commented 2 years ago

@hermannschwaerzlerUIBK We're nearing completion of the matrix connector and i remember you inquired about this option. Would you be interested in test running it and providing us with some feedback?