CLIP-HPC / goslmailer

GoSlurmMailer - drop in replacement for default slurm MailProg. Delivers slurm job messages to various destinations.
40 stars 6 forks source link

Extend telegram bot #26

Open pdpino opened 1 year ago

pdpino commented 1 year ago

Awesome library!

Is there a way to extend the telegram bot to answer more commands?

I suppose I can edit the source here and recompile, but I wonder if there is a different way to do this.

pja237 commented 1 year ago

Hey, thanks for the support! We did a brief brainstorm about something like this, to allow users to issue additional commands to slurm cluster via bot, e.g. issue a scancel to a job etc. But after brief thinking, we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization. So we focused just on the basic functionality to deliver messages, which is a safe one-way communication (bot->user).

As for how to go about it, it would be same as you have mentioned, edit that part of the code, assign handlers for additional commands and fill out the functions. I don't know of another way. What did you have in mind to add in terms of commands?

pdpino commented 1 year ago

we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization

Makes sense! It's safer if the user just logins to the cluster via the usual methods.

What did you have in mind?

Haven't defined this yet, but as ideas:

  1. query jobs state, e.g. check state of a job, check squeue, check old jobs
  2. configure the bot's verbosity (per user)
  3. Customize bot's message sent on "/start"
  4. cancel jobs (though this would be particularly unsafe, so can be left out)
  5. bot could send notifications regarding the cluster, e.g. "warning: you are reaching your storage quota" (could be out of scope for this project) (see next comment)
pdpino commented 1 year ago

I want to use the telegram bot to send more notifications regarding the cluster, e.g. "warning: you are reaching your max storage quota". I'd say is a bit out of scope for this library to implement it directly. But, it would be ideal if we could reuse functionality, something like:

import telegramBot, username2chatId

chatId = username2chatId("some telegram-username")
# note: could be unsafe (the user would have to register their cluster-username via the telegram bot?)
# we might have to work directly with chatId

telegramBot.sendMessage(chatId, "some message")

Or even for any connector with a go package?

import goslmailer
goslmailer.send("some message", "some username", "some connector")

or from the cmd line

$ goslmailer-send "some message" "some username" --connector telegram|email|etc

Does this make sense? I'm just kind of brainstorming here :grin:

pja237 commented 1 year ago

Hey, that's some interesting brainstorm you've dropped here. Made me think hard about all that. It does make sense, although i'm having trouble seeing the final picture, it's a bit too far away, still too vague. So let's try with some questions to clarify the vision up a bit, lets call whatever it is a "product", and i'll just unload the thoughts...

I've picked 2 use cases that i found to be technically "different", then i try to envision how the end product would look from the user perspective, and which components would i need to have in place to execute that product.

  1. list users jobs (squeue) - easy one
  2. quota warnings - complicated one?

For those two i tried then to envision what needs to exist to execute them. So, the user opens his phone, goes to the chat with the bot and lets say does:

  1. /jobs to get his jobs in the queue, now this is something that i see as initiated by the user, bot can then do a squeue, perhaps some parsing, prettifying and returns the list back, quite simple...

  2. the quota warnings, now, that's something not initiated by the user but by the state of the system itself. So, what would be needed then is: a) user must register for quota warnings (e.g. /quotanotify command to the bot (some users might not want this sent to them)) b) bot needs to start monitoring this users quota or there is another monitoring component sitting behind the bot to which it registers the user to monitor his quotas and send back the bot a notification to alert the user [monitor]<-->[bot]<-->[user]

In both cases, what's def. needed is a map between telegram user uids and cluster uids (t-uid<-->c-uid), Could be done manually, or with some tokens generated at the cluster with which users then authenticate themselves to the bot (here we're entering dangerous territory :laughing: ) Also to keep things useful, monitor must be configurable/modular enough to be able check quotas on different FSs (e.g. beegfs-ctl --getquota, etc.)

Other questions:

  1. would this be a telegram thing only, or a more general framework that supports delivery to other messaging systems, like through the connectors for other apps that we have

For now, the scope of what you've described is huge and in all that i see goslmailer might play just a tiny part from all the other components that might be missing. Unless we brainstorm-trim this down to some fundamentals significantly.

pdpino commented 1 year ago

Thanks for the reply! This looks interesting.

flow-notify

This flow covers UC-2 (notify quota warnings)

I see something like this: [monitor] --> [messager] --> [connector] --> [user].

[monitor]

[messager]

[connector]

flow-query

This covers UC-1 (query current jobs from the telegram-bot)

The messaging here needs to be in both directions: [listener] <--> [connector] <--> [user].

Though notice the flow is always initiated by the user: [user] --> [connector] --> [listener] --> [connector] --> [user]

[listener]

[connector]

Security

Authentication can get complicated and risky. Some thoughts:

Safety first

Authentication flow (initial idea)

To authenticate, the user must:

  1. generate the token once by running directly in the cluster: generate-token. This returns the token to stdout, e.g. 123456789, and stores it somewhere safe (e.g. /home/<username>/.mysecrettoken, no read/write access for other cluster-users)
  2. send this command to the bot once: /auth 123456789. The [connector] stores the mapping chat-id --> token somewhere safe in the cluster (no read/write access to cluster-users). After that, the user can send commands through the bot to run authenticated queries.

To run authenticated commands: the [listener] first validates the token by calling validate-token <cluster-uid> <token>, and only then runs the actual query

Putting all together

Both flows are different enough to be treated separately, however, the [connector] for both flows must be the same service (at least for a telegram bot).

A simplified scheme would go as this:

[monitor] --> [messager] --\
                            \
                             --> [connector] <--> [user]
                            /
             [listener] <--/

A more detailed scheme (blue are flow-notify, and orange are for flow-query):

tgslurmbot-diagram

The flow-notify is already covered by goslmailer :tada:. Some questions:

  1. Can we reuse the [messager] for other use-cases with the flow-notify?
    • Say I develop a [monitor] to check user quotas periodically (ideally, this would be configurable to support multiple FSs)
    • Can I call something like goslmailer <chat-id> <msg> --connector telegram? Can we extend goslmailer to support this?
    • Alternatively, can I run tgslurmbot <chat-id> <msg> directly? This would not support other connectors or the gobler, but would cover my use-case with telegram
  2. Can we extend the [connector] to support the flow-query?

    • Say I implement a [listener] for SLURM, an [authenticator], and a [mapping]
    • Can I provide my own code to add more bot commands? Or maybe import basic configuration?: e.g.

      import "configBot", "mapping"
      
      // this applies the current configuration from tgslurmbot
      b := configBot()
      
      // add my own handlers
      b.Handle("/jobs", func(c tele.Context) error {
        token := mapping.ChatId2Token(c.Chat().ID)
        response := run "slurm-listener list-jobs", pass token
      
        return c.Send(response)
      })
      
      b.Start()
pja237 commented 1 year ago

Hey, that's quite a thorough planning. Respect. In theory, it's all doable, but i'd ask you for a day/two to reread and digest/think it all through before i reply.