Discussion: Should we try and decrease the E2E suite duration and if so, how?

drikusroor commented 4 months ago

Is your feature request related to a problem? Please describe. Right now I'm working on automating the E2E tests (#1010). Testing two experiments already takes around 10 minutes. This for a large part due to the length of the audio files, some of which are longer than 10 seconds. If we want to cover all experiments, our E2E suite might take up to 1 hour or longer.

Describe the solution you'd like Perhaps it would be an idea to decrease the duration of the E2E suite in an intelligent way. Currently, the pre-existing (not yet automated) E2E tests consist of testing 3 experiments. Ultimately, these tests test several things:

The frontend application is loaded
The frontend can communicate with the backend
The audio files can be downloaded & played
Basic non-specific experiment logic, with sessions, next round, final round, storing data functionalities, etc. work as expected
The experiment is configured correctly, with the correct audio files
The experiment's rules are configured correctly and the frontend can work with them

I was thinking... Perhaps we can test some experiments (like the tests already written) full monty style, from start to finish, no compromises. We can then be sure that 1, 2, 3, and 4 work as expected. We can then think of testing 5 and 6 in an different way, using the same E2E suite, but without having to play back full experiments with audio files of more than 10 seconds.

5. The experiment is configured correctly, with the correct audio files

We can test this using a custom validation method, similar or exactly like the #978 and/or #995. Maybe we can build on these methods and see if they don't return an error message or status code. We can then be sure that the audio files configured at least exist in the uploads folder (I haven't thought of a solution for the experiments that use external files though). And we can also be sure that the CSV / configured sections, its names, groups, tags, and so on, pass the validation method of the experiment's rules class. As these validation methods are run in an instant, we don't have to wait for minutes until the experiments have been played from start to end.

6. The experiment's rules are configured correctly and the frontend can work with them

Additionally, I thought of maybe adding a special feature / test flag that allows us to run an experiment in a normal way except that it will use an audio file of 0.1 second or so instead of returning the configured audio files. This allows the E2E test to quickly click through the experiment, thereby saving us loads of time and compute. The experiments rules and behavior are then still tested fully from start to end.

Describe alternatives you've considered

We could also accept having an E2E suite of 1 hour (maybe it won't even become that long)
We could also only test a subset of the experiments, accept that we won't be able or willing to test all experiments, and go pareto principle on this - 20% of E2E tests cover up to 80% of the features / catch 80% of the possible errors.
We could also look into running the E2E tests in parallel. Depending on the amount of concurrent tests we can run, this could limit the total duration to the duration of the longest experiment (10 minutes-ish?)

BeritJanssen commented 3 months ago

I like the solution you propose with shorter audio files, but I wonder how much development time it would take to prepare that. Plus, we'd still not be testing the actual situation on the server. I was wondering whether we could have some "stub" tests which only go through a couple of trials, and randomly / systematically run one or two experiments from start to finish, so that over a couple of runs we'd still e2e tests all experiments eventually. That way, we'd probably still detect problems faster than relying on manual testing from the team or user reports, while we might not always be able to intercept problems prior to users running into them. (The stub tests would guarantee that users can at least get some way into the experiments though.)

I agree that point 5. can be much more efficiently dealt with through playlist validation on the backend. Most problems when setting up an experiment for the first time arise from typos in the description of sections.

drikusroor commented 3 months ago

Plus, we'd still not be testing the actual situation on the server. I was wondering whether we could have some "stub" tests which only go through a couple of trials, and randomly / systematically run one or two experiments from start to finish

Yeah, the last part was also what I meant with full monty style. :-) Test a couple of experiments without compromises, no shortcuts and test the other experiments with shortcuts, be it shorter audio fragments or, like you proposed just a part of the experiment.

Amsterdam-Music-Lab / MUSCLE

Discussion: Should we try and decrease the E2E suite duration and if so, how? #1025