Refactor measurements evaluations

olichtne commented 11 months ago

Description

This implements some much needed refactoring of MeasurementResults and the BaselineEvaluator and as such brings some reduction and unification of the relevant code.

Tests

Tested locally with some custom changes to use the "current result -> baseline result" and "threshold = 1" code adjustments with the following recipe:

#!/usr/bin/env python3

from lnst.Recipes.ENRT.SimpleNetworkRecipe import SimpleNetworkRecipe
from lnst.Controller.ContainerPoolManager import ContainerPoolManager
from lnst.Controller.MachineMapper import ContainerMapper
from lnst.Controller import Controller
from lnst.Controller.RunSummaryFormatters import HumanReadableRunSummaryFormatter
from lnst.Controller.RecipeResults import ResultLevel
import logging

from lnst.RecipeCommon.Perf.Evaluators.BaselineCPUAverageEvaluator import BaselineCPUAverageEvaluator
from lnst.RecipeCommon.Perf.Evaluators.BaselineEvaluator import BaselineEvaluator

class SimpleNetworkRecipe(SimpleNetworkRecipe):
    @property
    def net_perf_evaluators(self):
        parent = list(super().net_perf_evaluators)
        parent.append(
            BaselineEvaluator()
        )
        return parent

    @property
    def cpu_perf_evaluators(self):
        parent = list(super().cpu_perf_evaluators)
        parent.append(
            BaselineCPUAverageEvaluator(
                evaluation_filter={
                    "host1": ["cpu"],
                    "host2": ["cpu"],
                }
            )
        )
        return parent

recipe = SimpleNetworkRecipe(
    perf_tests=['tcp_stream'],
    perf_duration=5,
    perf_iterations=2,
    perf_msg_sizes=[1400],
    ip_versions=['ipv4'],
    dev_intr_cpu=[0],
    perf_tool_cpu=[0],
    ping_count=1,
    offload_combinations=[
        {'gro': 'on'},
    ],
    do_linuxperf_measurement=False
)

ctl = Controller(
    debug=True,
    poolMgr=ContainerPoolManager,
    mapper=ContainerMapper,
    podman_uri="unix:///run/podman/podman.sock",
    image="lnst"
)
try:
    ctl.run(recipe)
except Exception as e:
    print(e)
    pass

summary_fmt = HumanReadableRunSummaryFormatter(level=ResultLevel.IMPORTANT)
for run in recipe.runs:
    logging.info(summary_fmt.format_run(run))

After running this this is what the evaluation result descriptions look like:

    PASS 53_TestResult:
        Baseline evaluation of
        host host1 cpu 'cpu' utilization: 130.86 +-16.58 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
    PASS 54_TestResult:
        Baseline evaluation of
        host host2 cpu 'cpu' utilization: 130.93 +-16.77 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
    PASS 55_TestResult:
        Nonzero evaluation of flow:
        Flow(
            type=tcp_stream,
            generator=Host(machine_id=host1),
            generator_bind=192.168.101.1,
            generator_nic=Device(machine=host1, id=eth0, name=eth1, ifindex=3),
            generator_port=12000,
            receiver=Host(machine_id=host2),
            receiver_bind=192.168.101.2,
            receiver_nic=Device(machine=host2, id=eth0, name=eth1, ifindex=3),
            receiver_port=12000,
            msg_size=1400,
            duration=5,
            parallel_streams=1,
            generator_cpupin=[0],
            receiver_cpupin=[0],
            aggregated_flow=False
            warmup_duration=0,
        )
        PASS: generator_results reported non-zero throughput
        PASS: receiver_results reported non-zero throughput
    PASS 56_TestResult:
        Baseline evaluation of
        Flow(
            type=tcp_stream,
            generator=Host(machine_id=host1),
            generator_bind=192.168.101.1,
            generator_nic=Device(machine=host1, id=eth0, name=eth1, ifindex=3),
            generator_port=12000,
            receiver=Host(machine_id=host2),
            receiver_bind=192.168.101.2,
            receiver_nic=Device(machine=host2, id=eth0, name=eth1, ifindex=3),
            receiver_port=12000,
            msg_size=1400,
            duration=5,
            parallel_streams=1,
            generator_cpupin=[0],
            receiver_cpupin=[0],
            aggregated_flow=False
            warmup_duration=0,
        )
        Generator measured throughput: 1221659418.30 +-59245322.92(4.85%) bits per second.
        Generator process CPU data: 44.93 +-0.03 cpu_percent per second.
        Receiver measured throughput: 1213241304.82 +-58536978.08(4.82%) bits per second.
        Receiver process CPU data: 24.82 +-0.23 cpu_percent per second.
        New generator_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New generator_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%

Additionally if you remove the "cpu evaluation filters" this is what we get:

    PASS 53_TestResult:
        Baseline evaluation of
        host host1 cpu 'cpu' utilization: 193.62 +-8.60 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu0' utilization: 90.81 +-4.04 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu1' utilization: 12.30 +-0.35 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu2' utilization: 13.36 +-1.68 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu3' utilization: 17.52 +-1.86 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu4' utilization: 14.69 +-1.02 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu5' utilization: 13.88 +-1.66 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu6' utilization: 13.26 +-0.42 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host1 cpu 'cpu7' utilization: 17.86 +-2.49 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
    PASS 54_TestResult:
        Baseline evaluation of
        host host2 cpu 'cpu' utilization: 194.44 +-8.06 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu0' utilization: 90.59 +-3.71 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu1' utilization: 12.67 +-0.27 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu2' utilization: 13.48 +-1.68 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu3' utilization: 17.81 +-1.82 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu4' utilization: 14.77 +-0.92 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu5' utilization: 14.06 +-1.61 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu6' utilization: 13.28 +-0.34 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%
        host host2 cpu 'cpu7' utilization: 17.85 +-2.42 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%

so an interleaving of "result.description" and "comparison.description"...

jtluka commented 11 months ago

I have just one question about the output.

I see that there is:

        host host1 cpu 'cpu' utilization: 130.86 +-16.58 time units per second
        New utilization average is 0.00% higher from the baseline. Allowed difference: 1.0%

and

        Generator measured throughput: 1221659418.30 +-59245322.92(4.85%) bits per second.
        Generator process CPU data: 44.93 +-0.03 cpu_percent per second.
        Receiver measured throughput: 1213241304.82 +-58536978.08(4.82%) bits per second.
        Receiver process CPU data: 24.82 +-0.23 cpu_percent per second.
        New generator_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New generator_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%

I'm wondering if we could change the output so that the individual metric names are same in both metric measurement description and metric evaluation description, that is e.g:

        Generator measured throughput (generator_results): 1221659418.30 +-59245322.92(4.85%) bits per second.
        Generator process CPU data (generator_cpu_stats): 44.93 +-0.03 cpu_percent per second.
        Receiver measured throughput (receiver_results): 1213241304.82 +-58536978.08(4.82%) bits per second.
        Receiver process CPU data (receiver_cpu_stats): 24.82 +-0.23 cpu_percent per second.
        New generator_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New generator_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_results average is 0.00% higher from the baseline. Allowed difference: 1.0%
        New receiver_cpu_stats average is 0.00% higher from the baseline. Allowed difference: 1.0%

or ideally (I'm aware that this would require metric name to human readable name translation):

        Generator measured throughput (generator_results): 1221659418.30 +-59245322.92(4.85%) bits per second.
        ...
        Generator measured throughput average is 0.00% higher from the baseline. Allowed difference: 1.0%

It is probably out of scope of this MR and could be done separately.

olichtne commented 11 months ago

    Generator measured throughput (generator_results): 1221659418.30 +-59245322.92(4.85%) bits per second.

this should be simple so i'll do at least that in this PR

olichtne commented 9 months ago

added one more commit that makes the OvSDPDKPvPRecipe and VhostNetPvPRecipe use hostids consistent with the rest of the enrt recipes

Tested the OvSDPDKRecipe in J:8895512, and we don't currently run the VhostNetPvPRecipe so i didn't test that...

LNST-project / lnst

Refactor measurements evaluations #351

Description

Tests