Open pdpino opened 1 year ago
Hey, thanks for the support! We did a brief brainstorm about something like this, to allow users to issue additional commands to slurm cluster via bot, e.g. issue a scancel to a job etc. But after brief thinking, we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization. So we focused just on the basic functionality to deliver messages, which is a safe one-way communication (bot->user).
As for how to go about it, it would be same as you have mentioned, edit that part of the code, assign handlers for additional commands and fill out the functions. I don't know of another way. What did you have in mind to add in terms of commands?
we dropped the idea because it seemed we'd need to implement quite a lot to make it safe, like authentication/authorization
Makes sense! It's safer if the user just logins to the cluster via the usual methods.
What did you have in mind?
Haven't defined this yet, but as ideas:
squeue
, check old jobsI want to use the telegram bot to send more notifications regarding the cluster, e.g. "warning: you are reaching your max storage quota". I'd say is a bit out of scope for this library to implement it directly. But, it would be ideal if we could reuse functionality, something like:
import telegramBot, username2chatId
chatId = username2chatId("some telegram-username")
# note: could be unsafe (the user would have to register their cluster-username via the telegram bot?)
# we might have to work directly with chatId
telegramBot.sendMessage(chatId, "some message")
Or even for any connector with a go package?
import goslmailer
goslmailer.send("some message", "some username", "some connector")
or from the cmd line
$ goslmailer-send "some message" "some username" --connector telegram|email|etc
Does this make sense? I'm just kind of brainstorming here :grin:
Hey, that's some interesting brainstorm you've dropped here. Made me think hard about all that. It does make sense, although i'm having trouble seeing the final picture, it's a bit too far away, still too vague. So let's try with some questions to clarify the vision up a bit, lets call whatever it is a "product", and i'll just unload the thoughts...
I've picked 2 use cases that i found to be technically "different", then i try to envision how the end product would look from the user perspective, and which components would i need to have in place to execute that product.
For those two i tried then to envision what needs to exist to execute them. So, the user opens his phone, goes to the chat with the bot and lets say does:
/jobs
to get his jobs in the queue, now this is something that i see as initiated by the user, bot can then do a squeue, perhaps some parsing, prettifying and returns the list back, quite simple...
the quota warnings, now, that's something not initiated by the user but by the state of the system itself.
So, what would be needed then is:
a) user must register for quota warnings (e.g. /quotanotify
command to the bot (some users might not want this sent to them))
b) bot needs to start monitoring this users quota or there is another monitoring component sitting behind the bot to which it registers the user to monitor his quotas and send back the bot a notification to alert the user
[monitor]<-->[bot]<-->[user]
In both cases, what's def. needed is a map between telegram user uids and cluster uids (t-uid<-->c-uid
),
Could be done manually, or with some tokens generated at the cluster with which users then authenticate themselves to the bot (here we're entering dangerous territory :laughing: )
Also to keep things useful, monitor must be configurable/modular enough to be able check quotas on different FSs (e.g. beegfs-ctl --getquota, etc.)
Other questions:
For now, the scope of what you've described is huge and in all that i see goslmailer might play just a tiny part from all the other components that might be missing. Unless we brainstorm-trim this down to some fundamentals significantly.
Thanks for the reply! This looks interesting.
UC-1
(query jobs) and UC-2
(notify quota) from now on.flow-notify
and flow-query
flow-notify
This flow covers UC-2
(notify quota warnings)
I see something like this: [monitor] --> [messager] --> [connector] --> [user]
.
[monitor]
[monitor]
: calls MailProg
on events START
, END
, etcUC-2
we'd need to implement a [monitor]
(e.g. monitor-quota
) that checks user quota and decides when to notify the user. For example:
MailProg --user <chat-id> --message "warn: you're near the quota"
/quotanotify
command via the bot, the user registers via the cluster directly
monitor-quota register --connector telegram --chat-id <my-chat-id>
[messager]
[messager] --connector telegram|email|slack|etc --user chat-id|email-address|etc
goslmailer
accomplishes this[connector]
tgslurmbot
accomplishes this (also matrixslurmbot
and discoslurmbot
)gobler
can be a [connector]
as well, that instead of connecting to a bot, it spools and forwards to another [connector]
(like a decorator)flow-query
This covers UC-1
(query current jobs from the telegram-bot)
The messaging here needs to be in both directions: [listener] <--> [connector] <--> [user]
.
Though notice the flow is always initiated by the user: [user] --> [connector] --> [listener] --> [connector] --> [user]
[listener]
[monitor]
from beforeUC-1
we'd setup a script that runs squeue
, parses the output, and returns in a serialized format
[listener]
can be called like: slurm-listener list-jobs
[connector]
chat-id --> token
[listener]
/jobs
, call slurm-listener list-jobs
, return the result via the chatAuthentication can get complicated and risky. Some thoughts:
[listener]
, e.g.:
squeue
: seems safe enough, query system state and returnsscancel
: seems unsafe! (modifies system state). Do not implement at all (or implement at your own risk)To authenticate, the user must:
generate-token
. This returns the token to stdout, e.g. 123456789
, and stores it somewhere safe (e.g. /home/<username>/.mysecrettoken
, no read/write access for other cluster-users)/auth 123456789
. The [connector]
stores the mapping chat-id --> token
somewhere safe in the cluster (no read/write access to cluster-users). After that, the user can send commands through the bot to run authenticated queries.To run authenticated commands: the [listener]
first validates the token by calling validate-token <cluster-uid> <token>
, and only then runs the actual query
Both flows are different enough to be treated separately, however, the [connector]
for both flows must be the same service (at least for a telegram bot).
A simplified scheme would go as this:
[monitor] --> [messager] --\
\
--> [connector] <--> [user]
/
[listener] <--/
A more detailed scheme (blue are flow-notify
, and orange are for flow-query
):
The flow-notify
is already covered by goslmailer
:tada:. Some questions:
[messager]
for other use-cases with the flow-notify
?
[monitor]
to check user quotas periodically (ideally, this would be configurable to support multiple FSs)goslmailer <chat-id> <msg> --connector telegram
? Can we extend goslmailer to support this?tgslurmbot <chat-id> <msg>
directly? This would not support other connectors or the gobler, but would cover my use-case with telegramCan we extend the [connector]
to support the flow-query
?
[listener]
for SLURM, an [authenticator]
, and a [mapping]
Can I provide my own code to add more bot commands? Or maybe import basic configuration?: e.g.
import "configBot", "mapping"
// this applies the current configuration from tgslurmbot
b := configBot()
// add my own handlers
b.Handle("/jobs", func(c tele.Context) error {
token := mapping.ChatId2Token(c.Chat().ID)
response := run "slurm-listener list-jobs", pass token
return c.Send(response)
})
b.Start()
Hey, that's quite a thorough planning. Respect. In theory, it's all doable, but i'd ask you for a day/two to reread and digest/think it all through before i reply.
Awesome library!
Is there a way to extend the telegram bot to answer more commands?
I suppose I can edit the source here and recompile, but I wonder if there is a different way to do this.