Closed rwstauner closed 9 months ago
I would prefer if it had N/A for the entire row of failed benchmarks.
NaN usually indicates that some numerical error happened.
Thanks, I appreciate that feedback!
I had originally written up a solution that was more intentionally creating the data in the loop (which I think is what we'll need to do eventually). I lost my changes and I tried to quickly reimplement and so that I could push something to preserve the WIP. This version ended up being much simpler to start, as instead of rewriting the data generation it just fakes the data into something that can get past the calculations (which is what produces the NaNs).
I expect there will be more work needed to handle (whatever format is used for) the new empty data (anything that reads these CSVs) so I'll need to do a lot of exploring.
Could maybe do something simple, like not store any data for failing benchmarks, then at the end when reporting the data, if there's a benchmark that was in the list to be benchmarked, but has no associated data, you just report N/A for all the values?
Yeah, that's a good idea! That would be simpler than altering that loop.
I also wondered if there was any useful data we could include about the failure. I think the message would be too large. Not sure if the error class might be useful. Still, I'd have to see how anything downstream would handle the text in the data.
Maybe write the stderr output in a log file?
Well, yeah if there isn't value in jamming the failure into the data report, we could certainly just add extra output to the console. Would be a quick way to start, at least! Ok, I'll pursue that and we can decide if we want to do anything else with it down the road.
You can see it working in the failed head
build:
https://github.com/Shopify/yjit-bench/actions/runs/7617594824/job/20746942610?pr=272#step:5:1376
(fixed with #274)
Which raises a question: Would we want these CI tests to fail fast? Or run as much as they can?
Would we want these CI tests to fail fast? Or run as much as they can?
I would personally like this CI to run all. If it failed early, you wouldn't know what other tests could fail, which could ultimately end up wasting your time by failing again after fixing the immediate failure. It runs only one warmup iteration and one benchmark iteration, and 12 minutes don't seem too long as a CI job.
I think it would be a lot more noticeable if the two failed benchmarks appeared in the table, maybe as the first rows, with N/A for all values. Makes it hard not to notice.
Injected leading rows into the table filled with N/A
for any benchmarks that failed
Running
Where
fluentd
andruby-lsp
fail (on some executables) will generate data for all the successful benchmarks and produce output like this which reprints any failure output at the end and exits non-zero.As for the
./data/output_*
files:.txt
file that contains this same output as the console (includes the N/A rows)..csv
file contains only rows for benchmarks that passed for all executables..json
file shows data that passed (in this case "fluentd" appears under "yjit" but not under the other two).