./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T
transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us
N,T,4000,4000,4000,1,4000,0,4000,4000, 35250.1, 3631.2
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 0
transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1
N,T,4000,4000,4000,1,4000,0,4000,4000, 35287, 3627.4,95.7981,1.33614e+06,2.25557e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 1
transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1
N,T,4000,4000,4000,1,4000,0,4000,4000, 35275.3, 3628.6,96.6174,1.32481e+06,2.38491e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 0
transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1
N,T,4000,4000,4000,1,4000,0,4000,4000, 35298.7, 3626.2,95.8297,1.3357e+06,2.69371e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 1
transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1
N,T,4000,4000,4000,1,4000,0,4000,4000, 35302.6, 3625.8,96.7612,1.32284e+06,2.23521e-06
Expected behavior
The output of rocblas-bench with --verify 0 or -v 0 should be the same as rocblas-bench without the verify argument (i.e. default value for the verify argument). However, the output matches that of rocblas-bench with --verify 1 or -v 1.
To Reproduce
# docker image used: # compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.2:8_ubuntu20.04_py3.9_pytorch_release-2.1_53da8f8
# this was built from scratch with the main branch of rocm/rocblas ./rocblas-bench --version rocBLAS version: 4.2.0.ba39a399-dirty
rocBLAS-commit-hash: Tensile-commit-hash: dbc2062dced66e4cbee8e0591d76e0a1588a4c70
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us N,T,4000,4000,4000,1,4000,0,4000,4000, 35250.1, 3631.2
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 0 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35287, 3627.4,95.7981,1.33614e+06,2.25557e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 1 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35275.3, 3628.6,96.6174,1.32481e+06,2.38491e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 0 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35298.7, 3626.2,95.8297,1.3357e+06,2.69371e-06
./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 1 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35302.6, 3625.8,96.7612,1.32284e+06,2.23521e-06
Expected behavior
The output of
rocblas-bench
with--verify 0
or-v 0
should be the same asrocblas-bench
without the verify argument (i.e. default value for the verify argument). However, the output matches that ofrocblas-bench
with--verify 1
or-v 1
.Environment