Include failed benchmarks in output and continue

rwstauner commented 10 months ago

Running

./run_benchmarks.rb --once fluentd nqueens ruby-lsp setivar -e "head::$HOME/.rubies/ruby-dev/bin/ruby;yjit::$HOME/.rubies/ruby-dev/bin/ruby --enable-yjit;prism::$HOME/.rubies/ruby-dev/bin/ruby --parser=prism"

Where fluentd and ruby-lsp fail (on some executables) will generate data for all the successful benchmarks and produce output like this which reprints any failure output at the end and exits non-zero.

As for the ./data/output_* files:

The .txt file that contains this same output as the console (includes the N/A rows).
The .csv file contains only rows for benchmarks that passed for all executables.
The .json file shows data that passed (in this case "fluentd" appears under "yjit" but not under the other two).

itr #1: 38ms
RSS: 13.6MiB
MAXRSS: 14512.0MiB

Error output for fluentd (head):
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:6:in `<main>': yjit (RuntimeError)

Error output for fluentd (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:5:in `<compiled>': yjit (RuntimeError)

Error output for ruby-lsp (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/signature.rb:70:in `initialize': undefined method `any?' for nil (NoMethodError)
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `build_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:342:in `run_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:242:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:443:in `run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:423:in `maybe_run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:252:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `decorator'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:148:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/mixins/mixins.rb:8:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `include'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `<module:Optional>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:3:in `<compiled>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `require_relative'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/.gem/ruby/dev/gems/ruby-lsp-0.13.4/lib/ruby_lsp/internal.rb:8:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/ruby-lsp/benchmark.rb:10:in `<compiled>'

Total time spent benchmarking: 5s
Failed benchmarks: 3

head: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) [arm64-darwin23]
yjit: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +YJIT [arm64-darwin23]
prism: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +PRISM [arm64-darwin23]

--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
bench     head (ms)  stddev (%)  yjit (ms)  stddev (%)  prism (ms)  stddev (%)  yjit 1st itr  prism 1st itr  head/yjit  head/prism
fluentd   N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
ruby-lsp  N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
nqueens   166.4      0.0         166.8      0.0         166.3       0.0         1.00          1.00           1.00       1.00
setivar   38.5       0.0         38.5       0.0         38.4        0.0         1.00          1.00           1.00       1.00
--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
Legend:
- yjit 1st itr: ratio of head/yjit time for the first benchmarking iteration.
- head/yjit: ratio of head/yjit time. Higher is better for yjit. Above 1 represents a speedup.
- prism 1st itr: ratio of head/prism time for the first benchmarking iteration.
- head/prism: ratio of head/prism time. Higher is better for prism. Above 1 represents a speedup.

Output:
./data/output_158.csv

Failed benchmarks:
  head: fluentd
  prism: fluentd, ruby-lsp

maximecb commented 10 months ago

I would prefer if it had N/A for the entire row of failed benchmarks.

NaN usually indicates that some numerical error happened.

rwstauner commented 10 months ago

Thanks, I appreciate that feedback!

I had originally written up a solution that was more intentionally creating the data in the loop (which I think is what we'll need to do eventually). I lost my changes and I tried to quickly reimplement and so that I could push something to preserve the WIP. This version ended up being much simpler to start, as instead of rewriting the data generation it just fakes the data into something that can get past the calculations (which is what produces the NaNs).

I expect there will be more work needed to handle (whatever format is used for) the new empty data (anything that reads these CSVs) so I'll need to do a lot of exploring.

maximecb commented 10 months ago

Could maybe do something simple, like not store any data for failing benchmarks, then at the end when reporting the data, if there's a benchmark that was in the list to be benchmarked, but has no associated data, you just report N/A for all the values?

rwstauner commented 10 months ago

Yeah, that's a good idea! That would be simpler than altering that loop.

I also wondered if there was any useful data we could include about the failure. I think the message would be too large. Not sure if the error class might be useful. Still, I'd have to see how anything downstream would handle the text in the data.

maximecb commented 10 months ago

Maybe write the stderr output in a log file?

rwstauner commented 10 months ago

Well, yeah if there isn't value in jamming the failure into the data report, we could certainly just add extra output to the console. Would be a quick way to start, at least! Ok, I'll pursue that and we can decide if we want to do anything else with it down the road.

rwstauner commented 10 months ago

You can see it working in the failed head build: https://github.com/Shopify/yjit-bench/actions/runs/7617594824/job/20746942610?pr=272#step:5:1376 (fixed with #274)

Which raises a question: Would we want these CI tests to fail fast? Or run as much as they can?

k0kubun commented 10 months ago

Would we want these CI tests to fail fast? Or run as much as they can?

I would personally like this CI to run all. If it failed early, you wouldn't know what other tests could fail, which could ultimately end up wasting your time by failing again after fixing the immediate failure. It runs only one warmup iteration and one benchmark iteration, and 12 minutes don't seem too long as a CI job.

maximecb commented 10 months ago

I think it would be a lot more noticeable if the two failed benchmarks appeared in the table, maybe as the first rows, with N/A for all values. Makes it hard not to notice.

rwstauner commented 10 months ago

Injected leading rows into the table filled with N/A for any benchmarks that failed

Shopify / yjit-bench

Include failed benchmarks in output and continue #272