Amsterdam-Music-Lab / MUSCLE

An application to easily set up and run online listening experiments for music research.
https://www.amsterdammusiclab.nl/
MIT License
4 stars 1 forks source link

Discussion: Should we try and decrease the E2E suite duration and if so, how? #1025

Open drikusroor opened 4 months ago

drikusroor commented 4 months ago

Is your feature request related to a problem? Please describe. Right now I'm working on automating the E2E tests (#1010). Testing two experiments already takes around 10 minutes. This for a large part due to the length of the audio files, some of which are longer than 10 seconds. If we want to cover all experiments, our E2E suite might take up to 1 hour or longer.

Describe the solution you'd like Perhaps it would be an idea to decrease the duration of the E2E suite in an intelligent way. Currently, the pre-existing (not yet automated) E2E tests consist of testing 3 experiments. Ultimately, these tests test several things:

  1. The frontend application is loaded
  2. The frontend can communicate with the backend
  3. The audio files can be downloaded & played
  4. Basic non-specific experiment logic, with sessions, next round, final round, storing data functionalities, etc. work as expected
  5. The experiment is configured correctly, with the correct audio files
  6. The experiment's rules are configured correctly and the frontend can work with them

I was thinking... Perhaps we can test some experiments (like the tests already written) full monty style, from start to finish, no compromises. We can then be sure that 1, 2, 3, and 4 work as expected. We can then think of testing 5 and 6 in an different way, using the same E2E suite, but without having to play back full experiments with audio files of more than 10 seconds.

5. The experiment is configured correctly, with the correct audio files

We can test this using a custom validation method, similar or exactly like the #978 and/or #995. Maybe we can build on these methods and see if they don't return an error message or status code. We can then be sure that the audio files configured at least exist in the uploads folder (I haven't thought of a solution for the experiments that use external files though). And we can also be sure that the CSV / configured sections, its names, groups, tags, and so on, pass the validation method of the experiment's rules class. As these validation methods are run in an instant, we don't have to wait for minutes until the experiments have been played from start to end.

6. The experiment's rules are configured correctly and the frontend can work with them

Additionally, I thought of maybe adding a special feature / test flag that allows us to run an experiment in a normal way except that it will use an audio file of 0.1 second or so instead of returning the configured audio files. This allows the E2E test to quickly click through the experiment, thereby saving us loads of time and compute. The experiments rules and behavior are then still tested fully from start to end.

Describe alternatives you've considered

BeritJanssen commented 3 months ago

I like the solution you propose with shorter audio files, but I wonder how much development time it would take to prepare that. Plus, we'd still not be testing the actual situation on the server. I was wondering whether we could have some "stub" tests which only go through a couple of trials, and randomly / systematically run one or two experiments from start to finish, so that over a couple of runs we'd still e2e tests all experiments eventually. That way, we'd probably still detect problems faster than relying on manual testing from the team or user reports, while we might not always be able to intercept problems prior to users running into them. (The stub tests would guarantee that users can at least get some way into the experiments though.)

I agree that point 5. can be much more efficiently dealt with through playlist validation on the backend. Most problems when setting up an experiment for the first time arise from typos in the description of sections.

drikusroor commented 3 months ago

Plus, we'd still not be testing the actual situation on the server. I was wondering whether we could have some "stub" tests which only go through a couple of trials, and randomly / systematically run one or two experiments from start to finish

Yeah, the last part was also what I meant with full monty style. :-) Test a couple of experiments without compromises, no shortcuts and test the other experiments with shortcuts, be it shorter audio fragments or, like you proposed just a part of the experiment.