ProjectPhysX / OpenCL-Benchmark

A small OpenCL benchmark program to measure peak GPU/CPU performance.
Other
147 stars 16 forks source link

What do the fractions mean? #13

Closed sumseq closed 1 month ago

sumseq commented 1 month ago

It would be nice to have a bit more documentation. What do the fractions mean next to the result numbers? Is that simply the closest "nice" fraction of "peak flops" that the result got? If so, while that is useful for GPUs as it identifies the ratio of less DP core versus FP cores for instance, it may be confusing for CPU runs.
It sometimes makes it seem that the benchmark is only running on a subset of the CPU cores (for example, on my 64-core EPYC Rome 7702P it shows "1/64").

ProjectPhysX commented 1 month ago

Hi @sumseq,

those are the closest possible ratios of (benchmarked precision) / (FP32 according to specs).

For example, OpenCL reports 10 compute units and 1000 MHz clock frequency for an Nvidia Pascal GPU, then it has 10×128 cores × 2 instructions/clock × 1 GHz = 2560 GFlops/s = 2.56 TFlops/s in FP32.

The benchmark then measures runtime for a fixed number of computations in FP64, divides this measured runtime by the number of computed operations and finds 0.07 TFlops/s for FP64.

The division FP64/(spec FP32) = 0.07/2.56 = 1/36.57 is calculated and rounded to next possible fraction, in this case 1/32.

The FP64/FP32 ratio for any GPU architecture can only be either 1/64, 1/32, 1/24, 1/16, 1/12, 1/8, 1/4, 1/3, 1/2, 2/3, 1x, 2x, 4x, 8x, 16x, 32x, 64x, and nothing in between. Same is done for all other precisions, always with FP32 spec value as reference.

The (measured FP32)/(spec FP32) ratio should always be 1/1, but in practice this is not always the case as sometimes the actual clock speed is vastly different from what OpenCL reports as spec sheet value.

CPUs might struggle with proper AVX2/AVX512 vectorization, bandwidth, caching etc. and measure vastly lower TFlops/s than theoretical spec. The benchmark is really running on all CPU threads.

Kind regards, Moritz

sumseq commented 1 month ago

Thanks!

Could a summary of this be added to the README or maybe another DOC file of some kind?

ProjectPhysX commented 1 month ago

I've added a short description of this in the Readme!