NVIDIA / nvbench

CUDA Kernel Benchmarking Library
Apache License 2.0
474 stars 63 forks source link

Change nvbench_compare statuses #178

Open bernhardmgruber opened 2 months ago

bernhardmgruber commented 2 months ago

Currently, when I compare a baseline with new result JSON using nvbench_compare.py base.json new.json, I get output like the following:

ValueType Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 12.699 ms 8.71% 12.481 ms 10.17% -217.438 us -1.71% PASS
I16 12.514 ms 8.06% 12.513 ms 5.98% -0.956 us -0.01% PASS
I32 12.899 ms 10.22% 12.091 ms 2.15% -808.662 us -6.27% FAIL
I64 12.073 ms 1.19% 12.051 ms 0.81% -21.432 us -0.18% PASS
__int128 12.188 ms 4.55% 12.833 ms 2.56% 644.844 us 5.29% FAIL

With PASS colored green, and `FAIL colored red. It would be more useful if the the Status column could not only highlight significant changes, but show me whether the timings improved or regressed. Therefore, I propose an output like this (only status column changed):

ValueType Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 12.699 ms 8.71% 12.481 ms 10.17% -217.438 us -1.71% PASS
I16 12.514 ms 8.06% 12.513 ms 5.98% -0.956 us -0.01% PASS
I32 12.899 ms 10.22% 12.091 ms 2.15% -808.662 us -6.27% FAST
I64 12.073 ms 1.19% 12.051 ms 0.81% -21.432 us -0.18% PASS
__int128 12.188 ms 4.55% 12.833 ms 2.56% 644.844 us 5.29% SLOW

With PASS uncolored, FAST colored green. and SLOW colored red.

You can of course pick any better word for status, or different colors. But there is value in differentiating whether the time became faster or slower.

bernhardmgruber commented 1 month ago

Another question is whether these new statuses should appear by default or you would want to have a CLI switch between the PASS/FAIL and the PASS/SLOW/FAST.

alliepiper commented 1 month ago

I support this. The comparison script is still very much a work in progress and has much room for improvement. Clarifying whether the test was faster or slower would be a nice QoL change.

One suggestion: I don't think PASS fits with FAST and SLOW anymore. Maybe SAME for when the new samples are within the tolerance?

A related improvement would be to make the coloring more sensible, adding --color and --no-color flags to explicitly set the color state, and detect whether the output device supports color by default. IIRC we always use color, and this gets annoying when piping to a file, etc.