Speed-up user tests - Githubissues

IgnacioHeredia commented 1 month ago

User test are taking ages too run (in this example almost 5 mins). We should try to speed them up (parelizing, more resources, ...?).

@vykozlov says that, in old tox versions, one could reuse the same environment for all the different tests (bandit, flake, etc). But that is no longer possible in new tox versions, where the envirronments have to be rebuilt each time.

alvarolopez commented 1 month ago

Problem is not tox, but the SQA tooling that takes ages to execute. Since it is based in docker-compose and stages are dynamically generated, we cannot speed up by running tests in parallel.

I see some options here:

Execute the user tests separately from the platform tests (i.e. do not wait for tests to complete).
Execute them in parallel. We will earn some time, but this will still be a blocker for the rest of the tests that are doing deliveries (e.g. Docker, OSCAR or Zenodo).
Drop the SQA tooling and simply run the tests via the Jenkinsfile.

vykozlov commented 1 month ago

it is mainly the tox tool to my understanding:

In tox 3 it was possible to re-use the same python virtual environment across envs by specifying a shared envdir. reply from tox people: This was never actually supported, worked only by chance because our env detection logic wasn't good enough. We don't plan to support this behavior. https://github.com/tox-dev/tox/issues/2788#issuecomment-1367437092

docker-compose is pretty much the same speed as just docker agent, we use pre-built docker images. but for every test "qc.sty", "qc.cov", "qc.sec" tox rebuilds virt.env, i.e. re-installs all python dependencies from e.g. "requirements.txt"

and no, skipping user-tests does not make sense imo. What a point in building a Docker image and deliver it, if later the tests fail, i.e. the image is faulty??

vykozlov commented 1 month ago

By the way, don't forget that in order to run user tests, one has to install user application dependencies. The accepted from DEEPHDC practice is to start from the corresponding python docker image (previously Jenkins python node). This means that one has to install also e.g. tensorflow or pytorch, deepaas and all relevant packages. This certainly takes time. But the installation procedure is a part of testing. Therefore having a couple of minutes to run just qc.sty is to be expected, not "ages to run". The problem is again tox4: even if one configures same "envdir" for different tests, the virtual env is still recreated, e.g. tox writes qc.cov: recreate env because env type changed from {'name': 'qc.sty', 'type': 'VirtualEnvRunner'} to {'name': 'qc.cov', 'type': 'VirtualEnvRunner'}

vykozlov commented 1 month ago

The current work-around is to fix tox<4, which we have to do in CI/CD images (tox only allows minversion). Therefore, it is better to control the CI/CD images, e.g. fork https://github.com/indigo-dc/ci-images for python images and fix tox version there <4. See your updated example : user tests are down from 5 min to 3 min. The main time is consumed by installing dependencies, but only once.

ai4os / ai4os-hub-qa

Speed-up user tests #10