aws-samples / aws-fault-injection-simulator-workshop

MIT No Attribution
38 stars 17 forks source link

RFC: recording tools #17

Open rudpot opened 3 years ago

rudpot commented 3 years ago

Niclas Gothberg 10:10 AM Hi, I had a question today if there are any standards, best practices, and/or tools on how you can document your test cases, hypothesis, etc.. for your chaos experiments and that you can also document the outcome of the experiment and the iterations. Do anyone seen anything on that topic?

rudpot commented 3 years ago

Since we don't really have a work tracking solution, maybe a git based flow where experiment results are checked into git side-by-side with a hypothesis doc?

Thinking a little further with CI/CD - maybe also consider a path to create custom metrics for performance/regression testing paths?

mjkubba commented 2 years ago

CI/CD will work here but this to me is more of event-driven activity: experiment -> eventbridge -> lambda -> CodeCommit (git) or S3 Logic to go with eventbridge vs CI/CD, independently from how you kick off the experiment (API, GUI, CLI, etc.) no need to add the CI/CD to that, eventbridge can pickup when an experiment sends a signal and execute on that. Now I need to validate what kind of event experiments sends to EventBridge.

rudpot commented 2 years ago

CI/CD will work here but this to me is more of event-driven activity: experiment -> eventbridge -> lambda -> CodeCommit (git) or S3 Logic to go with eventbridge vs CI/CD, independently from how you kick off the experiment (API, GUI, CLI, etc.) no need to add the CI/CD to that, eventbridge can pickup when an experiment sends a signal and execute on that. Now I need to validate what kind of event experiments sends to EventBridge.

Note that the original ask here was bigger than what's in #56. Where #56 only focuses on persisting the FIS experiment state / logs, #17 would include a way to document what the intent was in the first place (what system are we testing, what is the architecture, what is the hypothesis, what metrics do we need to prove/disprove the hypothesis, what are success/failure thresholds). A simple implementation of #17 could use the eventbridge path suggested (is there a trigger?) in combination with a README.md template. If we are going to use this type of automation, it could also include scraping relevant cloudtrail / cloudwatch metrics into the same commit.