NHSDigital / software-engineering-quality-framework

🏎️ Shared best-practice guidance & tools to support software engineering teams
147 stars 35 forks source link

Expand the Performance Testing page with more advice #305

Open dlavender4 opened 1 year ago

dlavender4 commented 1 year ago

Notes I made for the Galleri team. A lot of this is applicable to everyone:

Integration points • Good place to start is looking at which integration points we have between components (ui > db, MESH > db, results > lambda > Comms, etc.) • Will need to show we've performance tested every key integration point

Data • Then look at datasets. Need large datasets at various points. • Different structured datasets for different points. • E.g.: o demographics data coming from CaaS o appointment data coming from GRAIL o results coming from GRAIL - getting bombarded with loads at once o Appointment booked messages coming from GRAIL, and us then sending out Comms to Participants o etc. • So we're not talking about one killer set of test data here: we need lots of different test data, for different points • Good part is that not all of that test data needs to be "life like". It'll depend on the test you're trying to do. e.g. demographics: if this test is JUST testing whether the receiving lambda can cope, then it can be the same record 11M times. Nothing clever needed. But if it's testing that the screens can search over 11M records, that DOES need live-like data or they won't behave in the same way.

Monitoring • "Performance" testing serverless almost becomes a game of finding cost points, rather than actually checking performance. • Please monitor costs during tests! We know AWS can scale to ridiculous amounts. But how much will that cost the Programme? • I imagine we will want to artificial rate limiting caps at various points in AWS to protect us from spending too much.
• We need to monitor while running these tests o CloudWatch o X-ray o Splunk, maybe o Cost Explorer o Etc. • Maybe start with what you want to see. Lambda load, database cache hits, etc. And work backwards to what tools can show us that.

Mocking • Think about which downstream services we can mock for specific tests. • E.g. receiving results. • We could mock Comms Manager. We could use a real one. We could use a real MESH, but going to a fake mailbox. Or mock MESH entirely. All good options, and choice will depend on what the test is trying to prove.

Environments • No PID in non-prod - can't stress that enough • So we want to performance test (at various points) before we put into the prod accounts. So will need fake datasets. Could generate them using something like Faker. Or anonymise real ones (unlikely, given we don't have any yet). • In terms of envs, easy enough to spin up in terraform. But think about making it easy to spin down too. Automate as much as possible: ideally each performance test would be scripted: spin up the env, run scripted tests, tear down. Monitor AWS throughout. • Some advice here around using APDEX: https://github.com/NHSDigital/software-engineering-quality-framework/blob/main/practices/performance-testing.md

PID • Would advise NOT performance testing using PID, even in prod. If we did, we'd need sign-off (from Programme), and a huge risk mitigated with SMEs (IG, clinical, eng, cyber, etc) in order to use people's real data in that way. • E.g. we'd need to be 100% sure all downstreams are switched off. Worst case, you accidently send a real person a fake cancer result - that would be hideous.

Types of test: • We'll need to look at: o Can we handle peak load at each integration point o Can we handle spikes o Will need data from the Programme around expected load and spike times, and spike durations, etc. Or we can guess, but caveat it. o Soak testing: can we handle peak load for a longer duration than expected o Destructive testing: knowing the breaking point of each integration is really important. Would be nice to have those documented. Again, not just performance, but remember those cost limits. A graph of load vs cost would be great. The Programme can plot their acceptable cost-point and we can add limits and throttles based on that.

stefaniuk commented 1 year ago

Mocking • Think about which downstream services we can mock for specific tests. • E.g. receiving results. • We could mock Comms Manager. We could use a real one. We could use a real MESH, but going to a fake mailbox. Or mock MESH entirely. All good options, and choice will depend on what the test is trying to prove.

This sounds too familiar to another programme, all off the above. My recommendation would be to mock and/or stub all the 3rd-party integration points (including MESH). Also, implement contract tests for all of them. This will pay off x times later, enabling you to introduce features and other changes safely, at speed.