elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.81k stars 8.2k forks source link

[Alerts] [Discuss] Alert UIs that depend on their parent plugins crash in Alerts Management #93258

Open Zacqary opened 3 years ago

Zacqary commented 3 years ago

Because alert UIs live inside of the plugin that defines their alert type, it's possible to write an alert UI that will work perfectly inside of its parent app, but crashes when you try to create or edit that alert type from the Alerts Management screen. For example, making use of a plugin dependency that triggers_actions_ui doesn't have.

Right now the only solution to this is to manually test an alert UI in Alerts Management to ensure it works correctly. If we miss this step, it's very easy to deploy a broken alert type that effectively can't be edited after it's created.

Is there any way to make this less prone to developer error? Can we automate this test somehow? Handle it with typechecking?

elasticmachine commented 3 years ago

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

gmmorris commented 3 years ago

Is there any way to make this less prone to developer error? Can we automate this test somehow?

It feels like, perhaps, we shouldn't have all Alerts enabled by default in Alerts Management. This won't relieve developers from having to verify their type works in Alerts Management, but as they'd have to do so explicitly, it's more likely we'll catch it.

The reason we can't automatically test each Alert Type is that we have no way of automating the fields that are specific to each alert type. We could perhaps make this easier for Alert Type implementors by requiring them (somehow) to provide a test fixture which fills out the fields in their Alert Type, and then we'd use those to create an Alert of each enabled type in our own e2e tests. 🤔

Another thought it to have an automated test that selects each and every type, and ensures it has at least rendered some fields there, but as we don't know what each type should render, this is hard for us to do reliably. This is also made harder by the fact that we've been asked to try and reduce the amount and duration of e2e UI tests, as it significantly impacts the CI time of each PR. I truly wish this wasn't the case... but I don't know what effort is needed to make it less costly.

This requires more thought and research, but I actually think it's very valuable and worth the effort.

Handle it with typechecking?

At platform level it's hard to do this kind of thing (we'd have to change a lot of code owned by Core team), if at all possible. It would also only be possible to catch type level things - many of the errors we've seen are only apparent in runtime.

pmuellr commented 3 years ago

We could perhaps make this easier for Alert Type implementors by requiring them (somehow) to provide a test fixture which fills out the fields in their Alert Type, and then we'd use those to create an Alert of each enabled type in our own e2e tests. 🤔

I like this idea. If we have to support create and edit, then presumably we can test create with no per-rule-type data, but for editing we'd want the rule-types to pre-create an alert that we would then edit.

It does feel like that "pre-create" part would get stale very quickly. Once written, when would it ever be updated to match the current alert params? Type checking would help, but as Gidi notes, still plenty of room for runtime errors to slip in here that the old/stale test might not catch - especially if new (optional) params are added over time.

pmuellr commented 3 years ago

It feels like, perhaps, we shouldn't have all Alerts enabled by default in Alerts Management.

I thought we already had a way to do this? Hopefully we can re-use that existing mechanism (if I'm right and such a capability exists) instead of creating a new/similar one.

That doesn't really help with this case anyway, since I believe the intent is to allow it to be editable from the management UI, and it just didn't happen to work out, technically.

I suppose we should probably suggest that rule-type authors actually test this scenario themselves, in our docs, and by example in our samples, existing rule-type code, etc.