JuliaLinearAlgebra / BLASBenchmarksGPU.jl

Benchmark BLAS libraries on GPUs
https://julialinearalgebra.github.io/BLASBenchmarksGPU.jl/stable/
Other
3 stars 2 forks source link

TFLOPS according to BLASBenchmarksGPU are higher than TFLOPS in the GemmKernels README plot #15

Open DilumAluthge opened 3 years ago

DilumAluthge commented 3 years ago

This issue is a continuation of the conversation started in pull request #11.

DilumAluthge commented 3 years ago

cc: @thomasfaingnaert @maleadt

DilumAluthge commented 3 years ago

Here are some of the differences between how we benchmark and how GemmKernels benchmarks:

  1. GemmKernels uses alpha = rand(Float32) and beta = rand(Float32). We just always use alpha = 1 and beta = 0.
  2. GemmKernels runs benchmarks that include no transposing, transpose A only, transpose B only, and transpose both A and B. We only run with no transposing.

Could either of those be responsible for the deviation that we're seeing?