Open tom91136 opened 5 months ago
Got this on GH200, actual value should be ~3TB/s:
BabelStream Version: 5.0 Implementation: Julia; src/CUDAStream.jl Running kernels 100 times Precision: double Array size: 268.4 MB(=0.3 GB) Total size: 805.3 MB(=0.8 GB) Using CUDA device: GH200 120GB (CuDevice(0)) Kernel parameters: <<<32768,1024>>> Init: 4.85317 s (=165.93425 MBytes/sec) Read: 1.90727 s (=422.2296 MBytes/sec) Function MBytes/sec Min (sec) Max Average Copy 2.979008268e60.00018 0.34157 0.00513 Mul 2.890738861e60.00019 0.16855 0.00352 Add 3.301423655e60.00024 0.15224 0.00315 Triad 3.314891033e60.00024 0.18389 0.00335 Dot 1.837655013e60.00029 1.26299 0.01428
The bandwidth and min column is too close so 2.979008268e6 looked like 2.979008268e60, it shouldn't use scientific notation as well.
2.979008268e6
2.979008268e60
Got this on GH200, actual value should be ~3TB/s:
The bandwidth and min column is too close so
2.979008268e6
looked like2.979008268e60
, it shouldn't use scientific notation as well.