canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
247 stars 119 forks source link

Optimise Scenario #1434

Open PietroPasotti opened 1 month ago

PietroPasotti commented 1 month ago

The unit testing suite for traefik is probably the largest scenario test battery around. Today I ran it and I realized it took over a minute and a half to complete, so I decided to profile it.

This is the result: image

to produce the profile, run with this tox env:

[testenv:profile-scenario-tests]
description = Profile scenario unit test battery
deps =
    pytest
    pytest-profiling
    ops[testing]>=2.17
    -r{toxinidir}/requirements.txt
commands =
    pytest -v --tb native {[vars]tst_path}/scenario --profile --profile-svg --log-cli-level=INFO -s {posargs}

There are some obvious candidates for optimization:

profiling scenario's own test suite yields a very similar picture: image

tonyandrewmeyer commented 1 month ago
  • using an in-memory mock for the sqlite db instead of using the real thing could shave off a good chunk of time spent in pointless IO

This indeed seems like an obvious win. Presumably we can just pass in :memory: instead of the unit state filename - trivial to do, and we're not testing sqlite so no downsides.

  • A ridiculous amount of time is spent in State.__new__. Can we do something about that?
  • A single list comprehension in state.py:143 takes 2 seconds of our time: can we lazify some of the code perhaps?

Both of these are the positional/keyword argument processing. I wonder if we can find a tidy way to have newer Python use the built-in support and only fall back to this on old Python.

  • how come mocking juju-log takes so long?

I wonder if this is also the same issue: lots of log lines and each one involves creating a JujuLogLine object.

benhoyt commented 1 month ago

Presumably we can just pass in :memory: instead of the unit state filename - trivial to do, and we're not testing sqlite so no downsides.

Yeah, this is probably the single biggest win. I made this change locally when running tox -e scenario on the traefik-k8s-operator tests. My times are different, but I get about 35s without the change, and 20s with the change, so a nice improvement! FWIW, it looks like Harness already uses :memory:.

Happy for us to spend a few hours of rainy-day time looking at the other bottlenecks too. (Let's just be careful not to get too carried away on the long tail...)

james-garner-canonical commented 3 weeks ago

Alex Batisse mentioned that Scenario tests take much longer and the culprit is YAML? Maybe they just need the compiled extension (cyaml?)

Batalex commented 3 weeks ago

Alex Batisse mentioned that Scenario tests take much longer and the culprit is YAML? Maybe they just need the compiled extension (cyaml?)

I checked, and I do have libyaml bundled with/used by pyyaml. When I profiled my tests, I noticed that the tests were spending a long time serializing yaml in scenario/runtime:Runtime._virtual_charm_root.

Mind you, this is but a quick and dirty experiment, but here is what I tried:

This yielded a ~15% speed increase for my test file

tonyandrewmeyer commented 1 week ago

Some initial times (best of 3), using traefik-k8s (branch scenario-7-migraine :joy:) -e scenario:

tonyandrewmeyer commented 1 week ago

I checked, and I do have libyaml bundled with/used by pyyaml. When I profiled my tests, I noticed that the tests were spending a long time serializing yaml in scenario/runtime:Runtime._virtual_charm_root.

@Batalex what repo (and branch, if appropriate) are you testing against? I'll use it as a second benchmark so that we're checking against multiple types of tests.

Batalex commented 1 week ago

@Batalex what repo (and branch, if appropriate) are you testing against? I'll use it as a second benchmark so that we're checking against multiple types of tests.

I think I was using the latest release (7.0.5) because I directly edited the files in the virtual env between two sessions during the sprint. I don't have this venv around any more, but the project itself has ops-scenario 7.0.5 in its dependencies lock file.

tonyandrewmeyer commented 1 week ago

I think I was using the latest release (7.0.5)

Hey @Batalex, sorry for being unclear. I meant what test suite are you running (one of the data platform repos I assume), so I can run against that one too with changes to make sure that they do improve things.

Batalex commented 1 week ago

Ah, sorry, I was quite slow-witted yesterday. I was testing on https://github.com/canonical/kafka-k8s-operator (branch feat/dpe-5591-status, which should be merged into main by today). IIRC, I was specifically running tox -e unit -- -s tests/unit/test_charm.py