Azure / platform-chaos

A node sdk for building services capable of injecting chaos into PaaS offerings. ⚙️ 🌩
MIT License
18 stars 7 forks source link

investigate scheduler approaches #17

Closed bengreenier closed 6 years ago

bengreenier commented 6 years ago

We desire a way to schedule chaotic events via a hosted service, to remove the dependency on manual intervention as currently exists with https://github.com/Azure/platform-chaos-cli.

This tracks the work we need to do to identify if there is a possible existing solution out there that may be coupled with the work we've already done to provide these features.

Current possibilities include:

bengreenier commented 6 years ago

depends on #16 - as we'll definitely want to call out to that api from whatever tool or solution we define for scheduling

mindlessroman commented 6 years ago

Starting thoughts

Azure Scheduler

Logic Apps

Azure Functions

Azure Batch

Jenkins

bengreenier commented 6 years ago

thanks for including those @mindlessroman! I'd be interested to hear what you recommend we use after doing some of that research.

mindlessroman commented 6 years ago

Discussed this a lot more in person during standup - the current front runner is Azure Scheduler. But some more questions/thoughts were raised:

How does each platform handle providing a level of randomness? What has a good/the best UI for a partner/user to configure? Scheduler - what does the max frequency entail?

mindlessroman commented 6 years ago

Scheduler

Randomness:

Trying to instantiate some chaos within a scheduled window would likely require some more coding gymnastics - doesn't seem to be built in. Probably not impossible 😅 More thinking would need to be done, but one thought: using an azure function (or something similar) to generate a time in specified window, then inputting that into the scheduler.

UI

Creating a quick scheduler was fairly simple and there are some user-configurable fields. 2018-09-04 16_59_42-action settings - microsoft azure 2018-09-04 17_00_28-action settings - microsoft azure 2018-09-04 17_01_00-authentication - microsoft azure 2018-09-04 17_01_25-retry policy - microsoft azure 2018-09-04 17_01_46-error action settings - microsoft azure

UI for timing (one-time v recurring) is pretty self-explanatory. 2018-09-04 17_02_33-schedule - microsoft azure 2018-09-04 17_02_55-schedule - microsoft azure

History

Doesn't mention who triggered it, but does manage history by scheduled task (in this example hktestingscheduler). image image

Frequency

The docs mention the limit on frequency. There will be a 409 error code if one tries to manually make a frequency that's higher than once a minute.

Conclusion

May be a better choice for chaos that occurs at predictable, settable times. ie chaotic actions with 'set' timing, not chaotically timed so-to-speak. But may be able to hack a solution for more time-chaos.

mindlessroman commented 6 years ago

Logic Apps

Randomness

Riffing on the RSS checker example - you could potentially set up an API that generates a time in the specific window and have a logic app listen/check in every X amount of time and then have it trigger the chaotic event when the generated time and the current time are the same, +/- 1 minute. Otherwise seems to be in the same area of "nothing out of the box" for chaotically timed things. Logic App switch statement may be less ideal as a solution but is available. image

UI

The UI is super visual - I believe this would translate well to a user since it's so configurable. The branching of what to do and when takes a lot of coding off the user's plate. It relies on value fields filled in by the user around conditions framed largely using plain English. (There is a JSON-style code editor that can be used as well, if the user wanted.) image A path on the "true" path in the example logic apps designer. Supports using a table to fill in the key-value pairs, or the use of a JSON-styled key-value associations. image

History

You can use the logging analytics. Not user specific but tracks a number of other details. The tracked properties mentioned here strikes me a useful feature.
If a user wanted to customize a bit more for logging - maybe to a SQL DB - then one could construct a stored procedure to add some. image

Conclusion

Seems to be a flexible option for the chaotic timing. Good if the plan were to be more complex decision paths based on what's returned in a chaotic event. (Below, a not very complicated decision tree of nested conditions) image Has a scope option which may be useful to have in the toolbox. image

mindlessroman commented 6 years ago

At this point, there doesn't seem to be one of these that offers a single solution to the problem we're hoping to address. In that case, we'll probably need have a blended solution that includes a few of these. Logic Apps definitely wins with the UI - very configurable, layman-readable and is the leading candidate in the Azure-generated solutions for something that is timed chaotically. Scheduler would be a good candidate for the predictably timed chaos, but with slightly less layman-readable UI. The UI's still very configurable, but it lacks the same visual engagement/flow that Logic Apps has.

Azure Functions would likely need to be used in our solution for generating a "random" time in a specified window.

mindlessroman commented 6 years ago

We had another chat last week. Using Logic Apps may be over engineering the problem - the leading solution is to invoke the chaos via CLI and take care of the randomly generated time that way. Scheduler itself doesn't care how a job is scheduled, so we should randomize at the layer ahead of Scheduler.