TFLOPS according to BLASBenchmarksGPU are higher than TFLOPS in the GemmKernels README plot

JuliaLinearAlgebra / BLASBenchmarksGPU.jl

Benchmark BLAS libraries on GPUs

Other

3 stars 2 forks source link

Open DilumAluthge opened 3 years ago

DilumAluthge commented 3 years ago

This issue is a continuation of the conversation started in pull request #11.

DilumAluthge commented 3 years ago

cc: @thomasfaingnaert @maleadt

DilumAluthge commented 3 years ago

Here are some of the differences between how we benchmark and how GemmKernels benchmarks:

GemmKernels uses alpha = rand(Float32) and beta = rand(Float32). We just always use alpha = 1 and beta = 0.
GemmKernels runs benchmarks that include no transposing, transpose A only, transpose B only, and transpose both A and B. We only run with no transposing.

Could either of those be responsible for the deviation that we're seeing?