chaoseng / wg-chaoseng

Chaos Engineering Working Group
Apache License 2.0
113 stars 34 forks source link

Draft chaos engineering definition/whitepaper #3

Open caniszczyk opened 6 years ago

seeker89 commented 6 years ago

Keen to help with that !

Lawouach commented 6 years ago

Happy to support the effort too.

3rdman commented 6 years ago

Me to :)

mattforni commented 6 years ago

Ping

caniszczyk commented 6 years ago

the best bet is currently to contribute to the proposal here which is sketching out a bit of an outline of what can become a whitepaper/landscape:

https://docs.google.com/document/d/1BeeJZIyReCFNLJQrZjwA4KMlUJelxFFEv3IwED16lHE/edit?ts=5ace0eab#heading=h.k8f5ndt8affu

Here are my ideas for a draft outline, would love feedback since I'm new to this space still:

ramin commented 6 years ago

ping

Lawouach commented 6 years ago

@caniszczyk That document is likely getting hard to navigate, and make sense of. I'm happy to move it to this repo so we can start using GH issues instead.

While GH is not a document-collaboration tool, I guess, should we clearly mark each section in the proposal, we could simply refer to each section from GH issues for discussions.

seeker89 commented 6 years ago

+1 to moving to GitHub

On Mon, 21 May 2018, 21:47 Sylvain Hellegouarch, notifications@github.com wrote:

@caniszczyk https://github.com/caniszczyk That document is likely getting hard to navigate, and make sense of. I'm happy to move it to this repo so we can start using GH issues instead.

While GH is not a document-collaboration tool, I guess, should we clearly mark each section in the proposal, we could simply refer to each section from GH issues for discussions.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chaoseng/wg-chaoseng/issues/3#issuecomment-390777617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdUOqZhtB29AnwHH1k71IQ2VZFZsqQAks5t0yfngaJpZM4TnmJy .

-- Mikolaj Pawlikowski

Lawouach commented 6 years ago

Regarding the outline @caniszczyk, it's a good starting point. I might add a section regardng chaos engineering in relation to other disciplines/practices: security, CI/CD... basically, where does CE fit in the toolchain? But, maybe, this is covered by the "CE in Cloud Native Systems"?

3rdman commented 6 years ago

I agreee with @Lawouach and @seeker89, the Google doc got crowded fast :)

We could just do a bit of Markdown on individual sections and then generate something, e.g. a PDF, when needed.

caniszczyk commented 6 years ago

on the suggestion from everyone, I converted what we had in the gdoc to here:

https://github.com/chaoseng/wg-chaoseng/blob/master/WHITEPAPER.md

It needs a lot of work but now we can start iterating via pull requests.

cc: @chaoseng/maintainers

joaoasrosa commented 6 years ago

@caniszczyk +1

Lawouach commented 6 years ago

Hey all,

Here is a strawman of structure for the whitepaper. Hopefully will help the discussion :)


Chaos Engineering Whitepaper v0.1

What is Chaos Engineering?

Short History

Principles

Objective: Harness and Improve System Resilience

Benefits for Cloud Native Systems

Relation to Existing Software and Operational Practices

Use Cases

Practicing Chaos Engineering

Chaos Engineering Flow

Define a Baseline

State the Hypothesis to Confirm/Infirm

Determine a Perturbation to Perform

Chaos Engineering Perturbations

Degrade Network Conditions

Vary Computing Resources

Stress to the Limits

Simulate Data Loss

Change ACLs Permissions

Provoke a Security Breach

Chaos Engineering Automation

Continous Chaos Engineering

Chaos Engineering Reporting

Report Findings

veggiemonk commented 6 years ago

Hi @Lawouach

Thank you for taking to the time to organize things a bit. Where does the landscape fit in this structure ? Can it be put in another document?

Lawouach commented 6 years ago

Hey @veggiemonk. Thanks, it looks like nothing when I look at it now but finding the right phrasing took me half a day the other day. Formalizing is hard :D

It depends on how we organize the whitepaper, either we list a bunch of examples for each section (so for instance on "Degrade Network Conditions", we could indicate Gremlin, Pumba, Muxy...) so that there is locality between the topic and potential vendors.

Or we continue with a long list of vendors at the bottom of the paper.

veggiemonk commented 6 years ago

Hi @Lawouach, I totally understand that's hard work! 🙏

For now, the landscape doesn't need to be too formal because the list isn't that long actually. As a suggestion, let's keep it it at the end.

What do you think?

I don't know if the white paper is the right place for that but what about renaming the section "Chaos Engineering Flow" to "How to start Chaos Engineering". As a first step, we could add "setup monitoring" As a second step, we could "Warn users/developers about it" ?

It seems pretty basic but without that it can be hard/dangerous to do CE. Maybe it is too simple for this paper.

What are your views on that?

Lawouach commented 6 years ago

Interesting, I like the guidelines approach indeed.

There is certainly room for a section around the theory, as per the principles. But a "how to get started" one would be very welcome indeed!

russmiles commented 6 years ago

How to get started + Links to product landscape and getting started points there would be awesome

veggiemonk commented 6 years ago

Ok let's see what kind of resources we can gather in there.

ramin commented 6 years ago

A section of case studies and papers around the field was something we discussed in the last meeting also. Maybe as a very final section on 'Further Reading' ?

@Lawouach thank you so much for getting this started!

What do people think about starting a branch with @Lawouach's structure as a README we can start opening PRs against with sections filled in, a merged PR is an approval and we can go deeper on specific content for each section, then link to each PR in this issue?

Lawouach commented 6 years ago

I think I will refine taking comments that were made. Give me a moment :)

Lawouach commented 6 years ago

Chaos Engineering Whitepaper v0.1

What is Chaos Engineering?

Short History

Principles

Discuss the steady state, experiment, etc. Just to set the "theory"?

Why practicing Chaos Engineering?

Harness and Improve System Resilience

If Chaos Engineering isn't the goal per-se, what is? Resiliency? Reliability?

Benefits for Cloud Native Systems

Software and Operational Practices In Production

A clear indication that whereas testing, CI/CD are mostly upstream practices, Chaos Engineering is very much downstream and act against a live system. would that make sense?

Use Cases

The current use-cases are a good starting point but should we detail them? Similar to the depth we can find in the serverless whitepaper?

Practicing Chaos Engineering

Getting Started With Chaos Engineering

Is my system ready to endure Chaos Engineering?

Should we hint at what minimal level you need to be before getting started? I mean, what if your system is barely resilient as it is?

Do I need to get started in production?

While we may want this, starting in prod may not fit "getting started scenarios".

Communicate with the Organization

This is where we need to continue the discussion and figure out how far we want/can go with the patterns.

Should we talk gamedays for instance? Observability?

The following phases may or may not be useful. I think it would be valuable if we could describe what it means to deal with chaos in those various cases, but is it the right place?

Chaos Engineering Perturbations

Degrade Network Conditions

Vary Computing Resources

Stress to the Limits

Simulate Data Loss

Change ACLs Permissions

Provoke a Security Breach

Assume application fails to restart

Chaos Engineering Automation

Continous Chaos Engineering

Chaos Engineering Reporting

Report Findings

Landscape

veggiemonk commented 6 years ago

That looks good! Thanks @Lawouach for the hard work!

I think a PR is in order for us to move forward.

mattforni commented 6 years ago

@chaoseng/maintainers (CC @caniszczyk) so just out of curiosity what is the plan on iterating on this document now? I had a few minutes this afternoon and wanted to add some of my thoughts here, but it's a bit difficult to know where to start.

I'm happy to just take some time, make some edits and submit a PR for consideration, but didn't want to ruffle any feathers or step on any toes. Would it be beneficial to assign topics to individuals to comment on? Just thinking out loud here.

Lawouach commented 6 years ago

Hey @mattforni, I'd say it's totally fine to offer PRs to the document?

On my side, I used this issue as it felt more rapid to get started but I wonder if that would scale for a whole document indeed :D

veggiemonk commented 6 years ago

PRs are the way to move forward! ⏩

caniszczyk commented 6 years ago

PRs please :)

On Thu, Jun 28, 2018 at 8:54 AM, Julien Bisconti notifications@github.com wrote:

PRs are the way to move forward! ⏩

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chaoseng/wg-chaoseng/issues/3#issuecomment-401042894, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5IUlInxWj6BOU6vOOWFOqMM63-Cf3ks5uBOAHgaJpZM4TnmJy .

-- Cheers,

Chris Aniszczyk http://aniszczyk.org +1 512 961 6719

Lawouach commented 6 years ago

Started on my trail of thoughts https://github.com/chaoseng/wg-chaoseng/pull/41