Possible regression with asv 0.5.1 using parameterized benchmark when some but not all skipped

Firstly, thanks for the fantastic library.

I am getting strange results when running benchmarks for which some parameter combinations are skipped and some not. This was OK for asv 0.4.2, but not for 0.5.1. I have a simple reproducer.

Test code:

class Simple:
    params = ([False, True])
    param_names = ["ok"]

    def time_failure(self, ok):
        if ok:
            x = 34.2**4.2
        else:
            raise NotImplementedError

Using a venv in Ubuntu MATE 20.10:

>>> pip list
Package       Version
------------- -------
asv           0.5.1  
distlib       0.3.4  
filelock      3.4.2  
pip           20.0.2 
pkg-resources 0.0.0  
platformdirs  2.5.0  
setuptools    44.0.0 
six           1.16.0 
virtualenv    20.13.1
>>> asv run -b Simple
Creating environments
<snip>
[100.00%] ··· step_detect.Simple.time_failure                         1/2 failed
[100.00%] ··· ======= =========
                 ok             
              ------- ---------
               False    failed 
                True   197±2ns 
              ======= =========
>>> asv show 2e59
Commit: 2e59 <master>

step_detect.Simple.time_failure [onion/virtualenv-py3.8-six]
Traceback (most recent call last):
  File "/home/iant/.venv/asv/bin/asv", line 8, in <module>
    sys.exit(main())
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/main.py", line 38, in main
    result = args.func(args)
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/commands/__init__.py", line 49, in run_from_args
    return cls.run_from_conf_args(conf, args)
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/commands/show.py", line 49, in run_from_conf_args
    return cls.run(
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/commands/show.py", line 93, in run
    cls._print_results(conf, commit, result_iter,
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/commands/show.py", line 169, in _print_results
    cls._print_benchmark(machine, result, benchmarks[name],
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/commands/show.py", line 179, in _print_benchmark
    info, details = format_benchmark_result(result, benchmark)
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/results.py", line 937, in format_benchmark_result
    display_result = [(v, statistics.get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/results.py", line 937, in <listcomp>
    display_result = [(v, statistics.get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/asv/lib/python3.8/site-packages/asv/statistics.py", line 80, in get_err
    a, b = stats['q_25'], stats['q_75']
KeyError: 'q_25'

If I skip all or none of the parameter combinations it is OK.

If I use ASV 0.4.2 instead then it works fine:

>>> pip uninstall asv
<snip>
>>> pip install asv==0.4.2
<snip>
>>> rm -rf .asv
>>> asv run -b Simple
<snip>
>>> asv show 2e59
Commit: 2e59b18a <master>

step_detect.Simple.time_failure [onion/virtualenv-py3.8-six]
  1/2 failed
  ======= ===========
     ok               
  ------- -----------
   False     failed  
    True   184±0.3ns 
  ======= ===========
  started: 2022-02-13 13:22:16, duration: 253ms

Thanks for reporting @ianthomas23. Did you have a look to see if the results files with both versions are different? Or did you try a 0.4 result file with asv show in 0.5? Would be good to know if the problem is in the results, or in the show command.

Here are the results using asv 0.4.2:

{"results": {"step_detect.Simple.time_failure": {"result": [null, 8.792673099932355e-08], "stats": [null, {"ci_99": [8.506566552892052e-08, 9.189114907855378e-08], "q_25": 8.53922904482327e-08, "q_75": 9.078514433935673e-08, "min": 8.506566552892052e-08, "max": 9.189114907855378e-08, "mean": 8.811884849472156e-08, "std": 2.778221072923389e-09, "repeat": 10, "number": 121129}], "params": [["False", "True"]]}}, "params": {"arch": "x86_64", "cpu": "11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz", "machine": "pyrus", "num_cpu": "8", "os": "Linux 5.13.0-27-generic", "ram": "32617612", "python": "3.9", "six": ""}, "requirements": {"six": ""}, "commit_hash": "2e59b18a2a356f1033ffff3a80850188511545f5", "date": 1644573251000, "env_name": "virtualenv-py3.9-six", "python": "3.9", "profiles": {}, "started_at": {"step_detect.Simple.time_failure": 1644854467588}, "ended_at": {"step_detect.Simple.time_failure": 1644854467842}, "benchmark_version": {"step_detect.Simple.time_failure": "bb0ff3891d22f11207ec22b88ea4bad21a42caf6b876852363e57d0162bd47ee"}, "version": 1}

and using asv 0.5.1:

{"commit_hash": "2e59b18a2a356f1033ffff3a80850188511545f5", "env_name": "virtualenv-py3.9-six", "date": 1644573251000, "params": {"arch": "x86_64", "cpu": "11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz", "machine": "pyrus", "num_cpu": "8", "os": "Linux 5.13.0-27-generic", "ram": "32617612", "python": "3.9", "six": ""}, "python": "3.9", "requirements": {"six": ""}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"step_detect.Simple.time_failure": [[null, 9.111695118065545e-08], [["False", "True"]], "bb0ff3891d22f11207ec22b88ea4bad21a42caf6b876852363e57d0162bd47ee", 1644854394423, 0.51277, [null, 8.7812e-08], [null, 9.6041e-08], [null, 8.9699e-08], [null, 9.3508e-08], [null, 116570], [null, 10]]}, "durations": {}, "version": 2}

I can't open the old results with new asv, or vice versa:

2e59b18a-virtualenv-py3.9-six.json is stored in a format that is newer than what this version of asv understands.  Update asv to use this file.

2e59b18a-virtualenv-py3.9-six.json is stored in an old file format.  Run `asv update` to update it.

If I follow the instructions in the latter case and asv update the old results they become

{"commit_hash": "2e59b18a2a356f1033ffff3a80850188511545f5", "date": 1644573251000, "env_name": "virtualenv-py3.9-six", "params": {"arch": "x86_64", "cpu": "11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz", "machine": "pyrus", "num_cpu": "8", "os": "Linux 5.13.0-27-generic", "ram": "32617612", "python": "3.9", "six": ""}, "python": "3.9", "requirements": {"six": ""}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"step_detect.Simple.time_failure": [[null, 8.792673099932355e-08], [["False", "True"]], "bb0ff3891d22f11207ec22b88ea4bad21a42caf6b876852363e57d0162bd47ee", 1644854467588, null, [null, 8.5066e-08], [null, 9.1891e-08], [null, 8.5392e-08], [null, 9.0785e-08], [null, 121129], [null, 10]]}, "durations": {}, "version": 2}

which give me what I am starting to call "the usual 0.5.1 error" of KeyError: 'q_25'.

Thanks for all the info. Feels like that key has been renamed, but not updated everywhere. I'll have a look. @LucyJimenez if you also want to have a look when you finish what you're working on, that would be great.

This behavior was deprecated at some point, the documentation states that:

If setup raises a NotImplementedError, the benchmark is marked as skipped.

So individual benchmarks are not supposed to raise. In any case this would be rather inefficient, since setup() and the entire benchmark will run until the error is raised.

From 0.6 onwards #1307 it is possible skip parameter sets efficiently (without running setup), with the @skip_for_params decorator:

class Simple:
    params = ([False, True])
    param_names = ["ok"]

    @skip_for_params([(False, )])
    def time_failure(self, ok):
        if ok:
            x = 34.2**4.2

This behavior has also been restored, but with a new error class.

from asv_runner.benchmarks.mark import SkipNotImplemented

class SimpleSlow:
    params = ([False, True])
    param_names = ["ok"]

    def time_failure(self, ok):
        if ok:
            x = 34.2**4.2
        else:
            raise SkipNotImplemented(f"{ok} is skipped")

Thanks for fixing this!

I am returning to this, and although you have presented a better approach for skipping some parameter combinations of a benchmark, the original error is still present in asv 0.6.1 as you cannot use asv show <commit> on such a partially-skipped benchmark.

To reproduce, I am using the SimpleSlow benchmark of 2 comments above. My results file is as follows:

{"commit_hash": "ee28c18160d7dd40a3a93f7f55db12f0e192642f", "env_name": "virtualenv-py3.10-matplotlib-numpy", "date": 1692734960000, "params": {"arch": "x86_64", "cpu": "Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz", "machine": "onion", "num_cpu": "6", "os": "Linux 5.11.0-37-generic", "ram": "16164140", "python": "3.10", "matplotlib": "", "numpy": ""}, "python": "3.10", "requirements": {"matplotlib": "", "numpy": ""}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"simple.SimpleSlow.time_failure": [[NaN, 9.610402479447959e-08], [["False", "True"]], "4ada089b3193a62c7538022a33878c654961a5964f08feb7de98e6846ea3177f", 1699115189427, 0.39397, [null, 8.6757e-08], [null, 1.9107e-07], [null, 8.9592e-08], [null, 1.0716e-07], [null, 121000], [null, 10]]}, "durations": {}, "version": 2}

There is one NaN and a few null for the skipped parameter.

If I run asv show ee28 I get

$ asv show ee28
Commit: ee28 <asv_0.6>

simple.SimpleSlow.time_failure [onion/virtualenv-py3.10-matplotlib-numpy]
Traceback (most recent call last):
  File "/home/iant/.venv/test/bin/asv", line 8, in <module>
    sys.exit(main())
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/main.py", line 29, in main
    result = args.func(args)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/__init__.py", line 49, in run_from_args
    return cls.run_from_conf_args(conf, args)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 45, in run_from_conf_args
    return cls.run(
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 89, in run
    cls._print_results(conf, commit, result_iter,
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 164, in _print_results
    cls._print_benchmark(machine, result, benchmarks[name],
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 172, in _print_benchmark
    info, details = format_benchmark_result(result, benchmark)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/results.py", line 935, in format_benchmark_result
    display_result = [(v, get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/results.py", line 935, in <listcomp>
    display_result = [(v, get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv_runner/statistics.py", line 23, in get_err
    a, b = stats["q_25"], stats["q_75"]
KeyError: 'q_25'

If I debug the asv code here https://github.com/airspeed-velocity/asv/blob/b8cbf81ccb808158102692a6fc42b8a9b2f2861e/asv/results.py#L935-L936 I see that for the skipped parameter v is nan and s is {}, i.e. an empty dict. As a workaround I have simply changed the line from

        display_result = [(v, get_err(v, s) if s is not None else None)

        display_result = [(v, get_err(v, s) if s else None)

and then it works fine for me. Of course, this may not be the correct solution but this is as far as my asv knowledge takes me.

I am happy to create a new issue for this if you wish, as this issue has already been closed.

I am returning to this, and although you have presented a better approach for skipping some parameter combinations of a benchmark, the original error is still present in asv 0.6.1 as you cannot use asv show <commit> on such a partially-skipped benchmark.

To reproduce, I am using the SimpleSlow benchmark of 2 comments above. My results file is as follows:

{"commit_hash": "ee28c18160d7dd40a3a93f7f55db12f0e192642f", "env_name": "virtualenv-py3.10-matplotlib-numpy", "date": 1692734960000, "params": {"arch": "x86_64", "cpu": "Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz", "machine": "onion", "num_cpu": "6", "os": "Linux 5.11.0-37-generic", "ram": "16164140", "python": "3.10", "matplotlib": "", "numpy": ""}, "python": "3.10", "requirements": {"matplotlib": "", "numpy": ""}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"simple.SimpleSlow.time_failure": [[NaN, 9.610402479447959e-08], [["False", "True"]], "4ada089b3193a62c7538022a33878c654961a5964f08feb7de98e6846ea3177f", 1699115189427, 0.39397, [null, 8.6757e-08], [null, 1.9107e-07], [null, 8.9592e-08], [null, 1.0716e-07], [null, 121000], [null, 10]]}, "durations": {}, "version": 2}

There is one NaN and a few null for the skipped parameter.

If I run asv show ee28 I get

Block (27 lines)
$ asv show ee28
Commit: ee28 <asv_0.6>

simple.SimpleSlow.time_failure [onion/virtualenv-py3.10-matplotlib-numpy]
Traceback (most recent call last):
  File "/home/iant/.venv/test/bin/asv", line 8, in <module>
    sys.exit(main())
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/main.py", line 29, in main
    result = args.func(args)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/__init__.py", line 49, in run_from_args
    return cls.run_from_conf_args(conf, args)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 45, in run_from_conf_args
    return cls.run(
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 89, in run
    cls._print_results(conf, commit, result_iter,
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 164, in _print_results
    cls._print_benchmark(machine, result, benchmarks[name],
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/commands/show.py", line 172, in _print_benchmark
    info, details = format_benchmark_result(result, benchmark)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/results.py", line 935, in format_benchmark_result
    display_result = [(v, get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv/results.py", line 935, in <listcomp>
    display_result = [(v, get_err(v, s) if s is not None else None)
  File "/home/iant/.venv/test/lib/python3.10/site-packages/asv_runner/statistics.py", line 23, in get_err
    a, b = stats["q_25"], stats["q_75"]
KeyError: 'q_25'
If I debug the asv code here

https://github.com/airspeed-velocity/asv/blob/b8cbf81ccb808158102692a6fc42b8a9b2f2861e/asv/results.py#L935-L936

I see that for the skipped parameter v is nan and s is {}, i.e. an empty dict. As a workaround I have simply changed the line from
        display_result = [(v, get_err(v, s) if s is not None else None)
to
        display_result = [(v, get_err(v, s) if s else None)
and then it works fine for me. Of course, this may not be the correct solution but this is as far as my asv knowledge takes me.

I am happy to create a new issue for this if you wish, as this issue has already been closed.

Actually this is a pragmatic fix which I think is suitable for a PR if you'd have some time to submit one (otherwise I can get to it sometime over the weekend as well).

airspeed-velocity / asv

Possible regression with asv 0.5.1 using parameterized benchmark when some but not all skipped #1028