ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
340 stars 157 forks source link

[Bug]: rocblas-bench `--verify` / `-v` option seems to be ignored. #1456

Closed danpetreamd closed 1 month ago

danpetreamd commented 1 month ago

To Reproduce

# docker image used: # compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.2:8_ubuntu20.04_py3.9_pytorch_release-2.1_53da8f8

# this was built from scratch with the main branch of rocm/rocblas ./rocblas-bench --version rocBLAS version: 4.2.0.ba39a399-dirty

rocBLAS-commit-hash: Tensile-commit-hash: dbc2062dced66e4cbee8e0591d76e0a1588a4c70

./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us N,T,4000,4000,4000,1,4000,0,4000,4000, 35250.1, 3631.2

./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 0 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35287, 3627.4,95.7981,1.33614e+06,2.25557e-06

./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T --verify 1 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35275.3, 3628.6,96.6174,1.32481e+06,2.38491e-06

./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 0 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35298.7, 3626.2,95.8297,1.3357e+06,2.69371e-06

./rocblas-bench -f gemm -r s -m 4000 -n 4000 -k 4000 --lda 4000 --ldb 4000 --ldc 4000 --transposeA N --transposeB T -v 1 transA,transB,M,N,K,alpha,lda,beta,ldb,ldc,rocblas-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,T,4000,4000,4000,1,4000,0,4000,4000, 35302.6, 3625.8,96.7612,1.32284e+06,2.23521e-06

Expected behavior

The output of rocblas-bench with --verify 0 or -v 0 should be the same as rocblas-bench without the verify argument (i.e. default value for the verify argument). However, the output matches that of rocblas-bench with --verify 1 or -v 1.

Environment

Hardware description
CPU AMD EPYC 7V12 64-Core Processor
GPU AMD Instinct MI250X/MI250
Software version
rocm-core 6.2.0.60200-8~20.04
rocblas 4.2.0.60200-8~20.04
danpetreamd commented 1 month ago

Thank you! 💯