NVIDIA / nvbench

CUDA Kernel Benchmarking Library
Apache License 2.0
481 stars 63 forks source link

Invalid JSON and CSV output when running with skipped tests #103

Closed GregoryKimball closed 1 year ago

GregoryKimball commented 1 year ago

Some of the libcudf benchmarks using nvbench generate invalid JSON and CSV files, even though the console output looks normal. One example is JOIN_NVBENCH, which shows the correct "Samples" value in the JSON and CSV output, but all of the timing data is set to invalid values like 0e-20, 0e-18 etc. I'm not certain that benchmark skipping is the trigger for this behavior and I will update this issue when I collect more evidence.

Repro: Build libcudf benchmarks and run the following command.

root@123:/cudf/cpp/build/benchmarks# ./JOIN_NVBENCH --devices 0 --json output.json --csv output.csv --benchmark 0

console:

...
## inner_join_32bit

### [0] NVIDIA A100 80GB PCIe

| Key Type | Payload Type | Nullable | Build Table Size | Probe Table Size | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  |
|----------|--------------|----------|------------------|------------------|---------|------------|--------|------------|--------|
|      I32 |          I32 |        0 |           100000 |           100000 |   3472x | 152.501 us |  5.74% | 144.658 us |  1.86% |
|      I32 |          I32 |        0 |           100000 |           400000 |   3200x | 174.260 us | 43.67% | 166.360 us | 43.41% |
|      I32 |          I32 |        0 |         10000000 |         10000000 |     90x |   5.593 ms |  0.17% |   5.585 ms |  0.09% |
|      I32 |          I32 |        0 |         10000000 |         40000000 |   1123x |  13.322 ms |  1.45% |  13.314 ms |  1.45% |
|      I32 |          I32 |        0 |         10000000 |        100000000 |     88x |  28.546 ms |  0.50% |  28.538 ms |  0.50% |
|      I32 |          I32 |        0 |         80000000 |        100000000 |     11x |  51.744 ms |  0.03% |  51.735 ms |  0.02% |
|      I32 |          I32 |        0 |        100000000 |        100000000 |     11x |  58.123 ms |  0.07% |  58.115 ms |  0.07% |
|      I32 |          I32 |        0 |         10000000 |        240000000 |     11x |  64.582 ms |  0.03% |  64.574 ms |  0.03% |
|      I32 |          I32 |        0 |         80000000 |        240000000 |     11x |  87.544 ms |  0.02% |  87.536 ms |  0.01% |
|      I32 |          I32 |        0 |        100000000 |        240000000 |     11x |  94.204 ms |  0.02% |  94.196 ms |  0.01% |

"output.csv"

Benchmark,Device,Device Name,Key Type,Payload Type,Nullable,Build Table Size,Probe Table Size,Skipped,Samples,CPU Time (sec),Noise,GPU Time (sec),Noise
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,100000,No,3472,0e-20,0e-18,0e-20,0e-18
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,100000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,100000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,100000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,400000,No,3200,0e-20,0e-17,0e-20,0e-17
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,400000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,400000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,400000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,10000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,10000000,No,90,0e-19,0e-19,0e-19,0e-19
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,10000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,10000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,40000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,40000000,No,1123,0e-18,0e-18,0e-18,0e-18
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,40000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,40000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,100000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,100000000,No,88,0e-18,0e-19,0e-18,0e-19
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,100000000,No,11,0e-18,0e-20,0e-18,0e-20
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,100000000,No,11,0e-18,0e-19,0e-18,0e-19
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000,240000000,Yes,,,,,
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,10000000,240000000,No,11,0e-17,0e-20,0e-17,0e-20
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,80000000,240000000,No,11,0e-17,0e-20,0e-17,0e-20
inner_join_32bit,0,NVIDIA A100 80GB PCIe,I32,I32,0,100000000,240000000,No,11,0e-17,0e-20,0e-17,0e-20
robertmaynard commented 1 year ago

My first thought was this was a locale issue. but the scale of the scientific notation is wrong for that.

Since we see the correct values being printed to the screen, it stands that it is an issue in fmt or csv_printer. But I am have been unable to reproduce this locally with fmt 7 or fmt 9.

robertmaynard commented 1 year ago

I have been able to reproduce this issue by building the container locally that @GregoryKimball was running.

Still investigating what the differences are between the two builds to cause the failure.