Closed simeonschaub closed 3 years ago
(It also looked to me like Float32 <- Float32 x Float32
was faster than Float32 <- Float16 x Float16
, it would probably be a good idea to benchmark that as well.)
I've updated https://github.com/JuliaLinearAlgebra/BLASBenchmarksGPU.jl/issues/14 accordingly :)
I'll also cc @mcabbott who will be interested in the performance of Tullio, and cc @vchuravy who will be interested in the performance of KernelAbstractions.
We now have only two JuliaGPU runners. And only one of those runners has a GPU that has Tensor Cores. (GemmKernels requires Tensor Cores.)
So we'll be waiting a while for CI to run :(
I'm going to merge this without CI because I'm impatient 😂
I'll re-enable the required status checks before I make the next release.
Thank you @simeonschaub!
(It also looked to me like
Float32 <- Float32 x Float32
was faster thanFloat32 <- Float16 x Float16
, it would probably be a good idea to benchmark that as well.)
I've added some plots for Matrix{Float32}=Matrix{Float32}×Matrix{Float32}
to https://github.com/mcabbott/Tullio.jl/issues/80
My initial testing suggests that these will probably be quite unimpressive currently, but I think it would be good to keep track of, as it should also be a good KernelAbstractions benchmark.