mattkae commented 10 months ago

Benchmarking

What's new?

Provided a new Benchmarker class which works with a cgroups backend.
Wrote some tests to make sure that the Benchmarker works
Integrated the Benchmarker in test_apps_can_run.py

How to test

git checkout feature/benchmarks
cd mir-ci/mir-ci 3 . pip install -e ..
pytest --junitxml=junit.xml test_apps_can_run.py
When that finishes, open up junit.xml and view the test results

What do the test results look like in `junit.xml`?

Like this:

...
    <testcase classname="test_apps_can_run.TestAppsCanRun" name="test_app_can_run[mir_demo_server-qterminal]" file="test_apps_can_run.py" line="12" time="3.153">
      <properties>
        <property name="compositor_cpu_time_microseconds" value="114179"></property>
        <property name="compositor_max_mem_bytes" value="58060800"></property>
        <property name="compositor_avg_mem_bytes" value="56726775.172413796"></property>
        <property name="client_cpu_time_microseconds" value="584842"></property>
        <property name="client_max_mem_bytes" value="37314560"></property>
        <property name="client_avg_mem_bytes" value="32306846.896551725"></property>
      </properties>
    </testcase>

...

codecov[bot] commented 10 months ago

Codecov Report

Merging #40 (590a9a2) into main (ac0afd0) will increase coverage by 15.63%. The diff coverage is 94.97%.

@@             Coverage Diff             @@
##             main      #40       +/-   ##
===========================================
+ Coverage   46.63%   62.26%   +15.63%     
===========================================
  Files           8       10        +2     
  Lines         491      652      +161     
  Branches       56       76       +20     
===========================================
+ Hits          229      406      +177     
+ Misses        249      226       -23     
- Partials       13       20        +7

Files	Coverage Δ
mir-ci/mir_ci/apps.py	`92.30% <100.00%> (+4.80%)`	:arrow_up:
mir-ci/mir_ci/conftest.py	`42.24% <85.71%> (+1.01%)`	:arrow_up:
mir-ci/mir_ci/display_server.py	`76.47% <92.30%> (+43.13%)`	:arrow_up:
mir-ci/mir_ci/benchmarker.py	`97.05% <97.05%> (ø)`
mir-ci/mir_ci/cgroups.py	`96.66% <96.66%> (ø)`
mir-ci/mir_ci/program.py	`89.58% <86.95%> (+3.68%)`	:arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Saviq commented 10 months ago

1. Give your user permissions to modify `/sys/fs/cgroup` (or be root)

I think we should get the path from the test runner / environment, in which we'd create a subdirectory and start from there.

What do the test results look like in junit.xml?

I think we can skip the PID there. Is there another number than CPU percent that would be more meaningful across devices? I know you can only ever compare apples to apples, but…

mattkae commented 10 months ago

1. Give your user permissions to modify `/sys/fs/cgroup` (or be root)
I think we should get the path from the test runner / environment, in which we'd create a subdirectory and start from there.

What do the test results look like in junit.xml?

I think we can skip the PID there. Is there another number than CPU percent that would be more meaningful across devices? I know you can only ever compare apples to apples, but…

Yeah we could create a path as root and then give the user permissions to read/write to that path!
We can skip PID, I agree
We could show cycles perhaps? Or maybe avg cycles per second?

mattkae commented 10 months ago

I would like to see if we can do good enough without polling? Do cgroups stats not give us enough numbers to calculate averages?

If we're going with cgroups after all, the added complexity of on_started and polling feels wasteful, WDYT? May be needed later for GPU testing, though, so maybe we shouldn't be throwing it away.

There's enough print()s all around that maybe it's time to get a logger going? Alternatively, there's the warnings module.

The tricky part is getting average memory usage. AFAIK, cgroups just reports stats on the current memory usage (and the max). Without polling, we wouldn't have any idea about the average
Yeah we technically don't need on_started now that I'm looking at it. I think this will be a good thing to remove, and we can add it back in if we need it later
Agreed on the logger! Although maybe that warrants a different pull request after this one? We might have thoughts on how it's implemented

mattkae commented 10 months ago

@Saviq Do you have any opinion on what CPU benchmark we should report? Right now, I'm showing "average CPU percentage", but that might not be ideal. I have access to the total CPU usage in microseconds

mattkae commented 10 months ago

The tricky part is getting average memory usage. AFAIK, cgroups just reports stats on the current memory usage (and the max). Without polling, we wouldn't have any idea about the average

There's a lot of numbers in e.g. memory.stat, none of them is cumulative? If there is something, we could divide over the test duration?

@Saviq Do you have any opinion on what CPU benchmark we should report? Right now, I'm showing "average CPU percentage", but that might not be ideal. I have access to the total CPU usage in microseconds

Yeah that would be more resistant to hardware changes I think.

The docs to mention it, but from my testing, I see no number in memory.stat going up over time. I just tested it, and nothing in there looks cumulative.
I will do microseconds then :+1:
We need the pid to find the cgroup file, as it will be at some path. That's why I expose it to the benchmarker at least

canonical / mir-ci

Benchmarking for the DisplayServer and its spawned application #40

Benchmarking

What's new?

How to test

What do the test results look like in `junit.xml`?

Codecov Report

What do the test results look like in `junit.xml`?

What do the test results look like in `junit.xml`?

canonical / mir-ci

Benchmarking for the DisplayServer and its spawned application #40

Benchmarking

What's new?

How to test

What do the test results look like in junit.xml?

Codecov Report

What do the test results look like in junit.xml?

What do the test results look like in junit.xml?

What do the test results look like in `junit.xml`?

What do the test results look like in `junit.xml`?

What do the test results look like in `junit.xml`?