Open pnasrat opened 1 year ago
I certainly love this idea (I am biased, I know 😉) and I would like to hear what others in the team think about it, so summoning the whole @2i2c-org/engineering to provide initial feedback on the idea that @pnasrat brought!
Context
Runbooks are documented procedures for troubleshooting and/or operational tasks such as support tasks.
Currently some of these procedures are documented in the infrastructure guide but rely on copy and pasting or writing new deployer commands. As new monitoring and alerting are added, initial runbooks may evolve as we get familiar with classes of problem.
https://hackernoon.com/simplify-devops-with-jupyter-notebook-c700fb6b503c
Potential benefits
Gives SRE a consistent environment to run production debugging from in a cluster (eg if someones work laptop could potentially fix from a personal laptop with just a web browser avoiding setting up deployer, etc) Builds sets of runnable playbooks that can be converted into automation if needed
Notes
This could be either in a central 2i2c-org cluster and access remote credentials using deployer, or potentially per hub.
Proposal
This is a feasibility investigation and evaluation by team story to see if such
Out of scope: runbook creation
Some implementations of executable runbooks are:
Nurtch/rubix as used by GitLab
Related: https://damianavila.github.io/blog/posts/binder-%2B-nikola-%2B-jupyter-%2B-github-blogging-resourceless.html
While Google Cloud Shell AWS CloudShell and Azure Cloud Shell creating our own on top of JupyterHub encourages using our own infrastructure to debug.
Limitations:
Still need to be able to debug from SRE workstations in cases where an outage doesn't allow a hub to be running.
Updates and actions
No response