Discuss performance test coverage

phavekes commented 2 days ago

This issue is imported from pivotal - Originaly created at Apr 24, 2020 by bstrooband

In order to prevent performance issues in the future I think we should be able to test Stepup performance with large sets of data. Therefore we need to define what and how we want to test and what we consider a representative test set.

I don\'t want to get into (implementation) details too much to keep the discussion open. I think the main goal should be to investigate the requirements of possible performance tests and the environment to be used. We have to take into account that a decision could also affect automated deploy tests in the future. Which I personally would like to have also fully automated before releasing to catch mitigations before delivery.

There are however some challenges we need to discuss?

Are we only going to test MW or Stepup as a whole? Because RA/SA integration could also impose performance issues.
Which processes do we want to test?
Will testing only releases/tags be sufficient?
Where should we test long running processes (> 50 min)
- Should we widen the scope of this issue regarding deploy integration tests in a CI/CD pipeline when tagging/releasing? They are also long running and are not automated but a decision could have effect on those test as well.
- Could a (possibly paid) CI/CD pipeline solution be an option to mitigate resource limitations of Travis? (Prevention of bugfix iterations could quickly outweigh costs of a custom CI/CD solution)

phavekes commented 2 days ago

@michielkodde / @pmeulen / @phavekes

Would you join the discussion? (bstrooband - Apr 24, 2020)

phavekes commented 2 days ago

Performance tests is a broad subject. The two questions that most interest me are:

Did a new release introduce a perfomance issue?
How will perfomance be with X users, Y logins/min etc. The goal is to predict when we are going to run into limits so we can solve them.

For #2 looking at the behaviour of the production system working well for now, we keep a large performance reserve to be able to handle outages and perfomance peeks. To Address #1 we need testing before doing a release. That means mimicing production load.

Form experience we know that projection / database related operations in Stepup are sensitive to issues because of execution time or memory usage. When configuration changes can not be applied this is almost always an issue. For testing this, being able load load the database and event_stream with a number of users, institutions and configurations that are similar to production should allow us to validate the performance during testing. We might need to run the application with debug disabled to get valid result.

Regarding login perfomance. Fortunately the gateway itself is simple performance wise. Is never handles large amounts of data. Its database interaction is very simple.

A first thing to focus on is the relative performance of all the middleware operations with different sized number of users, SPs, institutions and configuration events. I.e. test with 10, 100, 1000, 10000, 100000 and see how performance scales. (Pieter van der Meulen - Apr 24, 2020)

phavekes commented 2 days ago

Like Bas stated, as a developer I\'m very interested in performance-related findings during development.

How to test

I think the new toolset that allows bootstrapping users and tokens of any type will help in that regard. That would have us run manual tests against a representable dataset. Certainly doable, but it might become a tedious and time-consuming business quickly.

Running some of these tests on a CI environment would have my personal preference. Given my past experiences of running Gateway tests on Github Actions, doing something similar for Middleware would be possible. We would have to investigate if loading the user, token, institution test set would pose any timeout issues. I think this will quickly become the bottleneck for these kinds of tests.

We could consider working with a base image that is pre-loaded with test data. But this would entail somebody being responsible for updating that image and test-set. This should also be something that can be automated using Github Actions (on release / tag).

What to test

Relevant console actions like pushing to the institution-configuration endpoint, but also whitelisting, SP institution config, ...)
Other API interactions with Middleware originating from SS or RA. These actions could be curled onto the Middleware to prevent having to spin up those environments. The caveat of that option is that any API changes need to be updated in the testdata. Whereas a recent RA or SS checkout would have implemented those API changes.
Run an event replay, and see if we can find any changes between projections afterward
Run any other QA tests in the same run

Test size

I think the proposal of @pmeulen is very interesting. Running successive tests against a growing test set would give valuable insights. At the other hand, it would be a long-running test, more suitable for pre-release (or nightly builds).

Utopia

I\'d love to see a CI environment where we can run tests against a fully operational Stepup stack. In such an env, running the \'deploy\' Behat tests would be possible. Making testing for regressions much easier. We now manually run these tests. But this is rather tedious, as you need to check every component for being in the correct state for testing.

Bas and I have often brainstormed over some coffee on how to reach that goal. Our current approach would be to have a Stepup base image containing at least GW, MW, SS and RA. That image would need to be rebuild regularly. Using a monorepo might help in that area. (Michiel Kodde - Apr 28, 2020)

phavekes commented 2 days ago

The usage limits for Github Actions are way more forgiving then Travis\' limits, even for the free tier. The limit you\'ll probably will hit first is the concurrent jobs maximum of 5. The time limit is 6 hours per job and 72 hours for complete the workflow.

https://help.github.com/en/actions/getting-started-with-github-actions/about-github-actions#usage-limits

We could use the tests from Stepup-build and add performance tests to those tests. The only challenge will be that we need the whole Stepup stack in order to be representative so then a mono repo with only core components would be a good solution. Which would be valuable also for regular development. Then we could configure the CI/CD pipeline to run the tests when a tag is set (so before releasing).

Maybe we could start off with the following steps, and decide during and after each step if and how we would like to continue.

Monorepo POC
- History must be preserved!
- Test creation of readonly components from the mono repo
- Do we have other requirements?
Integrate Behat in monorepo POC
- What do we need?
- db
- test idp
- mail catcher?
- ...
Refactor shortcuts during POC and make setup production ready.

(bstrooband - May 7, 2020)

OpenConext / Stepup-Project

Discuss performance test coverage #514