Closed DilumAluthge closed 3 years ago
Here's an example plot generated on a Titan V:
Here's the code used to generate it:
using BLASBenchmarksGPU
import CUDA
bench_result = BLASBenchmarksGPU.runbench(:CUDA, Float16, Float16, Float32)
import PyPlot
BLASBenchmarksGPU.plotbench(bench_result, "plot.png")
CUDA.versioninfo()
And here's the output, including all of the TFLOPS values:
julia> using BLASBenchmarksGPU
julia> import CUDA
julia> bench_result = BLASBenchmarksGPU.runbench(:CUDA, Float16, Float16, Float32)
Progress: 100%|███████████████████████████████████████████████████| Time: 0:08:24
Size: 16384
CUBLAS: 62.9 TFLOPS
GemmKernels: 66.19 TFLOPS
Tullio: 0.29 TFLOPS
Bennchmark Result of Matrix{Float32}=Matrix{Float16}×Matrix{Float16}
24×4 DataFrame
Row │ Size Library TFLOPS Time
│ Int64 Symbol Float64 Float64
─────┼─────────────────────────────────────────────────
1 │ 128 CUBLAS 0.148099 28321.0
2 │ 128 GemmKernels 0.0332214 126253.0
3 │ 128 Tullio 0.0540789 77559.0
4 │ 256 CUBLAS 0.642928 52190.0
5 │ 256 GemmKernels 0.268756 124851.0
6 │ 256 Tullio 0.310169 108181.0
7 │ 512 CUBLAS 4.36686 61471.0
8 │ 512 GemmKernels 2.02769 132385.0
9 │ 512 Tullio 0.677953 395950.0
10 │ 1024 CUBLAS 25.168 85326.0
11 │ 1024 GemmKernels 14.2662 150529.0
12 │ 1024 Tullio 0.850125 2.52608e6
13 │ 2048 CUBLAS 59.5005 288735.0
14 │ 2048 GemmKernels 36.4345 471527.0
15 │ 2048 Tullio 0.76592 2.24304e7
16 │ 4096 CUBLAS 84.8186 1.62039e6
17 │ 4096 GemmKernels 57.559 2.38779e6
18 │ 4096 Tullio 0.393097 3.49631e8
19 │ 8192 CUBLAS 90.1617 1.21949e7
20 │ 8192 GemmKernels 59.9127 1.83519e7
21 │ 8192 Tullio 0.324503 3.38829e9
22 │ 16384 CUBLAS 62.9003 1.39842e8
23 │ 16384 GemmKernels 66.1909 1.3289e8
24 │ 16384 Tullio 0.294184 2.98999e10
julia> import PyPlot
julia> BLASBenchmarksGPU.plotbench(bench_result, "plot.png")
julia> CUDA.versioninfo()
CUDA toolkit 11.1.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.27.4
Libraries:
- CUBLAS: 11.3.0
- CURAND: 10.2.2
- CUFFT: 10.3.0
- CUSOLVER: 11.0.1
- CUSPARSE: 11.3.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.27.4
- CUDNN: 8.0.4 (for CUDA 11.1.0)
- CUTENSOR: 1.2.1 (for CUDA 11.1.0)
Toolchain:
- Julia: 1.6.0-beta1
- LLVM: 11.0.0
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
Environment:
- JULIA_CUDA_VERBOSE: true
1 device:
0: TITAN V (sm_70, 658.500 MiB / 11.784 GiB available)
Fixes #12