JuliaLinearAlgebra / BLASBenchmarksGPU.jl

Benchmark BLAS libraries on GPUs
https://julialinearalgebra.github.io/BLASBenchmarksGPU.jl/stable/
Other
3 stars 2 forks source link

add Tullio benchmarks #20

Closed simeonschaub closed 3 years ago

simeonschaub commented 3 years ago

My initial testing suggests that these will probably be quite unimpressive currently, but I think it would be good to keep track of, as it should also be a good KernelAbstractions benchmark.

simeonschaub commented 3 years ago

(It also looked to me like Float32 <- Float32 x Float32 was faster than Float32 <- Float16 x Float16, it would probably be a good idea to benchmark that as well.)

DilumAluthge commented 3 years ago

I've updated https://github.com/JuliaLinearAlgebra/BLASBenchmarksGPU.jl/issues/14 accordingly :)

DilumAluthge commented 3 years ago

I'll also cc @mcabbott who will be interested in the performance of Tullio, and cc @vchuravy who will be interested in the performance of KernelAbstractions.

DilumAluthge commented 3 years ago

We now have only two JuliaGPU runners. And only one of those runners has a GPU that has Tensor Cores. (GemmKernels requires Tensor Cores.)

So we'll be waiting a while for CI to run :(

DilumAluthge commented 3 years ago

I'm going to merge this without CI because I'm impatient 😂

I'll re-enable the required status checks before I make the next release.

DilumAluthge commented 3 years ago

Thank you @simeonschaub!

DilumAluthge commented 3 years ago

(It also looked to me like Float32 <- Float32 x Float32 was faster than Float32 <- Float16 x Float16, it would probably be a good idea to benchmark that as well.)

I've added some plots for Matrix{Float32}=Matrix{Float32}×Matrix{Float32} to https://github.com/mcabbott/Tullio.jl/issues/80