StackStorm / community

Async conversation about ideas, planning, roadmap, issues, RFCs, etc around StackStorm
https://stackstorm.com/
Apache License 2.0
8 stars 3 forks source link

[2015] ChatOps Requirements #25

Open jfryman opened 9 years ago

jfryman commented 9 years ago

Over the holiday, I started writing down some requirements for ChatOps to at least get the conversation started. My hope is that while we're working on the current sprint, at least having seen this thread gets the creative juices flowing and we can start scoping out work and hit the ground running during the next sprint.

Please ask questions, throw in ideas, whatever...

Requirements

Notes:

Target specific chat platforms:

jfryman commented 9 years ago

/cc @Kami @manasdk based on our conversation yesterday.

Please all join in if this interests you. :grinning:

epowell101 commented 9 years ago

Basic question(s):

jfryman commented 9 years ago

"ChatOps commands can be ACL'd" reside? Should that be in ST2? Or in another package? Not in a fork of Hubot, right?

I did not want to discuss implementation details here, rather just the overarching goals we should strive for. I have an idea, but don't want to hinder the creativity of the dev team either. We'll have a design session soon and post more data as we get it.

The default for a ChatOps command, imo, is disabled until enabled. We stand to make a big fubar action if we accidentally expose a very unsafe command. Until our story around ACL is much tighter, we make it easy for folks to enable commands... but the exercise is deliberate.

Help is very similar to the CLI. In fact, the functionality is mostly there... just need some ChatOps specific output formatting for a Chat room.

manasdk commented 9 years ago

Where would or should capabilities like "ChatOps commands can be ACL'd" reside? Should that be in ST2? Or in another package? Not in a fork of Hubot, right?

ACL is going to be required in other clients therefore we will have to build this feature into st2. (User + ACL + Role = RBAC). We could come up with a for-pay impl which would make fully baked RBAC feature a drop-in package and notion of users as a OS feature. Calling out that this is possible and not necessarily how it should be done. My larger point here is that it necessarily cannot be built into any of the chat bots we support.

As we work through this feature list I suspect we will discover that there are other features that will become st2 platform requirements as opposed to being built into the bots.

Generally, I look at the bot as another st2 client like the UI, cli etc. Of course how a chat bot exposes functionality to users is what differentiates it from these other clients but largely there will be model similarities.

Chat bot(s) will be how we deliver ChatOps to customers - right?

jfryman commented 9 years ago

Some points I would like to discuss today:

jfryman commented 9 years ago

Some friendly suggestions...

@jfryman: re Chatops - my 2c: while keep on working on detailing end-to-end picture, try also to define an immediate "walking skeleton" implementation, doable in 2 weeks. It is "not a prototype": a prototype is throw-away, "w.s". is a production-grade application that may use some but not all final architecture.

jfryman commented 9 years ago

Had an additional conversation with @manasdk today about ChatOps, and the current state.

I recorded it, but only got one side of the conversation. http://youtu.be/pkvICaLPPok

You can hear me if you listen hard enough, but the bulk of detail is here.

jfryman commented 9 years ago

Right now, we're working on the basic scaffolding... and so I've limited the Jira actions only to the first two weeks in this sprint. As we learn more, it will inform how we attack P2/P3 items.

Open Jira items:

jfryman commented 9 years ago

Another chat today, this time about Notifications. Recorded (properly) for your viewing pleasure.

http://youtu.be/ztleZOxtbWI

enykeev commented 9 years ago

Another chat today, this time about Notifications. Recorded (properly) for your viewing pleasure.

There is a possibility (and I'm making an effort not to do bold statements here) there is no need in additional Notification API beyond Eventsource stream we already have. It gets a little tricky in each particular case, but it would help us reinforce what we already have instead of creating something entirely new.

Stream reports on every CUD operation with executions (both single action and workflow executions). If we are talking about situation where particular action should always, no matter what, say something on particular channel, we can always make it a part of action definition and then our chatops integration (no matter bot or a sensor) can listen to the stream, check every message for, say, record.action.notification.on_status_succeded

{
  "channel": "#stackstorm",
  "message": "Action {{ action.name }} finished ok"
}

then check the execution has status succeeded and if it is, post a message to the channel. That's one case.

Another case is when you only need notifications for particular executions. On chatops integration you do it exactly the same way, but instead of action, you put it in execution (so initially you are saying something like run action core.local cmd="sleep 10" then when it succeeds, notify channel #stackstorm with message '...'). Then once again, you need a bunch of client-side logic to catch this executions in a stream, check the condition has been met and then post a message.

Depending on how complex you want this notifications to be, there might be a reason to move this notification object from action and execution directly to the root of history object (it has a lot of different shit already like Runner and Trigger so it should not be a problem) and then define much more complex conditions here.

The side effect of having this stuff as a part of history record is that when auditing the system we can not only tell the action has been executed at that point in time by that user, but also that this and that channel was notified about it and probably even list nicknames of users who was online on the channel at that time.

And then we have an intermediate notifications... I don't have a solution for that, but I do have another usecase. In st2cd we have some actions like building and packing that are somewhat long running and output a lot of stdout and we can only check their output once they are finished. We need to stream their stdout and stderr to the client while they are still in progress. This functionality is somewhat parallel to the Notifications API we are discussing here and may or may not intersect with it at some point, but it does solve the problem of showing script progress to the user. More than that, it doesn't require user to modify their existing scripts (given they already outputing some kind of progress to the stdout)

enykeev commented 9 years ago

The part you are talking about the sensor approach I urge you to apply all your arguments to any other sensor our user may want to create in the system, like something really critical for infrastructure to work as a whole. If for example we can't guarantee something would happen in response to a nagios sensor or if this sensor goes down and we don't have a good and fast sensor recovery procedure. All this problems is completely separate for chatops sensor in particular and if we not going to solve them at one point or the other, I don't understand what the fuck are we doing here in the first place. One of the reasons to build ChatOps as a sensor is to dogfood ourselves to make sure you could actually build something robust and reliable using toolset we created.

Also, the difference between ChatOps client vs CLI client is the level of customization. CLI client is built around API we already have and its feature set is defined by that API. With ChatOps, we define what functionality we want to have in the chat room and this is where configured sensors becomes really useful. Manas asked me the same question yesterday and I already told him that if we wanted user to be able to customize the UI with the same precision we want user to be able to customize chatops, I would build it around sensors too.

enykeev commented 9 years ago

I also see a reason to document here another part of the discussion we had with @jfryman yesterday. This one is related to queuing the messages in case st2 is down.

As soon as you start doing that you are making an assumption user want this action to happen no matter when st2 get back online. If it happens like in a second, most likely there won't be any problem (although how often we expect st2 to go down? What is a chance user would need something in this particular "second"? Is it worth the resources we would need to spend on implementing this queuing?).

However, the longer this downtime period are, the higher the possibility user don't want this command to be run anymore or has another command that conflicts with the one he's called before. This basically means that besides the queuing system we need some kind of queue manager for user to be able to see all the messages that are in queue and be able to remove some of them. And then it starts to pile up: who can remove message from queue? Should we run them as soon as st2 go up or should we wait for user to review the queue before resending it? How do we know st2 is up? And so on...

And all this stuff does not in any way help us if bot goes down since we are basically end up in the same situation only a bunch of man-hours later.

If this is the real problem and it can't be solved by user looking to the userlist to check integration is online or by integration acknowledging the reception of the command, to make it robust and reliable, it should be done through external component that watches the chat and st2 execution stream and makes sure every command that was in a chat actually got executed by the system. Then if every one component dies, we can make sure user would be at least notified his command is not going to be executed here and now.

But we should only do that if this is a real problem and not some crazy corner case scenario that likely never happen

jfryman commented 9 years ago

Conversation happening at https://stackstorm.slack.com/archives/stackstorm/p1426862677011306

jfryman commented 9 years ago

Had another hangout today with @enykeev @lakshmi-kannan @manasdk @Kami @dzimine to go over Sensor vs API. Good discussion!

In there, we discussed the current implementation of Sensor. Main sticking point: client expects two way communication for ChatOps commands (specifically, ack). With the exception of the WebHook sensor, this is not how sensors work (today). In a nutshell, Sensor = UDP, API = TCP. We need assurance of deliverability and SLAs, neither of which we can provide within Sensors.

For now, we will continue down the API route for Hubot compatibility. This does not preclude that we will not approach the sensor angle in the future (may be needed for clients that do not support a bot... like Skype). Likewise, we will also explore 1-way sensors to gather data from event streams for potential AI down the road.

jfryman commented 9 years ago

Alright, feedback time. Have a working deployment of st2workbench running with ChatOps. Feedback so far:

Polish:

The detachment of Notifications and Aliases is awkward. To create a new round trip rule, I have to:

Next:

jfryman commented 9 years ago

Buttons are hard

jfryman commented 9 years ago

A few thought I'd love some comments on.

A ChatOps command can request a user confirmation to execute the command (magic_word to validate)

Should this be part of a query, or part of a workflow? I sort of think this might actually be a good action for a stdlib type pack for StackStorm.

A ChatOps command can force output to a specific Chat Room, regardless where it is executed. A ChatOps command can restrict execution to a specific ChatRoom

I think this is covered now, given I could effectively send to a different notification channel. Is this ideal to have rules, or parameters to the trigger? I think the latter, but input appreciated.

A ChatOps command can be a singleton, and prevent multiple copies of itself running at a given time.

I'm pretty sure this will be covered by https://github.com/StackStorm/discussions/issues/59 (/cc @m4dcoder)

dzimine commented 9 years ago
jfryman commented 9 years ago

what about output formatting / prettifying?

Yes, still outstanding task. Going to try and get some feedback re: this this week as time allows, but realistically it's next week unless @manasdk wants to take a swing at doing some formatting with TerminalTable or something of the like

concerned if workroom/puppet the only way to deploy Chatops

Updated this AM. https://gist.github.com/jfryman/deeabda77813dae7c458#manual-installation

manasdk commented 9 years ago

I am happy to take up output formatting. I will experiment with some sort of TerminalTable today.

jfryman commented 9 years ago

Ok! Some additional feedback!

First, this PR: https://github.com/StackStorm/st2incubator/pull/207. Second, ignore the deploy namespace. This pack ultimately will get folded into the packs pack.

1) Inline Feedback: (https://github.com/StackStorm/st2incubator/blob/master/packs/deploy/aliases/pack_delete.yaml#L1)

---
# How do I have commands I want to group from different packs into a single namespace?
# I can no longer leverage the deploy name for other similarly named activities,
# like deploying an application.
#
# Likewise, if I flip this to 'pack', then it prohibits me from being able to have
# multiple actions belong under the 'pack' command tree, which I need in this case.
# I'd actually prefer it being !pack deploy XXX and !pack delete XXX
#
# Both are desired.
name: "delete"
action_ref: "deploy.delete"

# Having this as the !help text would be stellar
description: "Delete StackStorm packs from system"

formats:
  - "pack {{pack}}"

This is a problem, because I really want something like:

!pack deploy XXX !pack delete XXX !pack info XXX

Could not model this as-is

2) Right now, everything goes back through the notification channel. See https://stackstorm.slack.com/archives/chatops/p1430855117001806. I really only want get_information.result.stdout to be pushed via the notification channel. How can I be selective on what gets notified?

3) Regex. https://github.com/StackStorm/st2incubator/blob/master/packs/deploy/aliases/pack_deploy.yaml#L6-L8 I started to list out all permutations, but then rage quit a bit and paired it back.

4) Needs Investigation: https://stackstorm.slack.com/archives/chatops/p1430855098001800. force=true doesn't seem to be passing through properly.

st2express_local_8080___history_55491dab9c99381f9c1b210d_general_and_slack_and_inbox__4_messages__2_unread_

manasdk commented 9 years ago

(1) Should be possible if not will fix code to make it support both approaches. (2) Just write the rule accordingly. The trigger contains everything so should be able to reference only the desired property. (3) Multiple formats are not supported - I sorta left it there to see if it gets traction. (4) Sounds like a bug will investigate.

jfryman commented 9 years ago

2) Write the rule? Please elaborate