ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
44 stars 62 forks source link

is this a serious program>? #251

Open idreamerhx opened 11 months ago

idreamerhx commented 11 months ago

root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ./clients/staging/hipblaslt-bench -m 2048 -n 2048 -k 2048 --precision f32_r -v 1 --activation_type relu Query device success: there are 1 devices

Device ID 0 : AMD Radeon VII gfx906:sramecc+:xnack- with 17.2 GB memory, max. SCLK 1801 MHz, max. MCLK 1000 MHz, compute capability 9.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64

rocblaslt warning: No paths matched /opt/hipBLASLt/build/release/library/../Tensile/library/gfx906co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly. transA,transB,grouped_gemm,batch_count,M,N,K,alpha,lda,stride_a,beta,ldb,stride_b,ldc,stride_c,ldd,stride_d,d_type,compute_type,activation_type,bias_vector,hipblaslt-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,N,0,1,2048,2048,2048,1,2048,4194304,0,2048,4194304,2048,4194304,2048,4194304,f32_r,f32_r,relu,0, 2.72763e+06, 6.3,4.47063,3.84376e+06,1.08487 root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ./clients/staging/hipblaslt-bench -m 1024 -n 1024 -k 1024 --precision f32_r -v 1 --activation_type relu Query device success: there are 1 devices

Device ID 0 : AMD Radeon VII gfx906:sramecc+:xnack- with 17.2 GB memory, max. SCLK 1801 MHz, max. MCLK 1000 MHz, compute capability 9.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64

rocblaslt warning: No paths matched /opt/hipBLASLt/build/release/library/../Tensile/library/gfx906co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly. transA,transB,grouped_gemm,batch_count,M,N,K,alpha,lda,stride_a,beta,ldb,stride_b,ldc,stride_c,ldd,stride_d,d_type,compute_type,activation_type,bias_vector,hipblaslt-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,N,0,1,1024,1024,1024,1,1024,1048576,0,1024,1048576,1024,1048576,1024,1048576,f32_r,f32_r,relu,0, 279030, 7.7,4.39526,488829,1.12318 root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ^C root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ./clients/staging/hipblaslt-bench -m 102^C-n 1024 -k 1024 --precision f32_r -v 1 --activation_type relu root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ./clients/staging/hipblaslt-bench --precision f32_r -v 1 Query device success: there are 1 devices

Device ID 0 : AMD Radeon VII gfx906:sramecc+:xnack- with 17.2 GB memory, max. SCLK 1801 MHz, max. MCLK 1000 MHz, compute capability 9.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64

rocblaslt warning: No paths matched /opt/hipBLASLt/build/release/library/../Tensile/library/gfx906co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly. transA,transB,grouped_gemm,batch_count,M,N,K,alpha,lda,stride_a,beta,ldb,stride_b,ldc,stride_c,ldd,stride_d,d_type,compute_type,activation_type,bias_vector,hipblaslt-Gflops,us,CPU-Gflops,CPU-us,norm_error_1 N,N,0,1,128,128,128,1,128,16384,0,128,16384,128,16384,128,16384,f32_r,f32_r,none,0, 776.723, 5.4,4.06425,1032,1.07202

what fuck the gpu has 200Tflops? 279030

idreamerhx commented 11 months ago

root@1a89b5aa5fce:/opt/hipBLASLt/build/release# ./clients/staging/hipblaslt-test hipBLASLt version: 300

Query device success: there are 1 devices

Device ID 0 : AMD Radeon VII gfx906:sramecc+:xnack- with 17.2 GB memory, max. SCLK 1801 MHz, max. MCLK 1000 MHz, compute capability 9.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64

info: parsing of test data may take a couple minutes before any test output appears...

[==========] Running 10091 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10046 tests from _/matmultest [ RUN ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_badarg [ OK ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_badarg (240 ms) [ RUN ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_bad_argt2 [ OK ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_bad_argt2 (0 ms) [ RUN ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_bad_argt3 [ OK ] /matmul_test.matmul/pre_checkin_matmul_bad_arg_bad_argt3 (0 ms) [ RUN ] /matmul_test.matmul/pre_checkin_alpha_beta_zero_NaN_f16_rf16_rf16_rf16_rf32_r_NN_256_128_64_nnan_256_64_nnan_256_256_1

rocblaslt warning: No paths matched /opt/hipBLASLt/build/release/library/../Tensile/library/gfx906co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly. /opt/hipBLASLt/clients/gtest/../include/unit.hpp:208: Failure Expected equality of these values: float(hCPU[i + j size_t(lda) + k strideA]) Which is: 0 float(hGPU[i + j size_t(lda) + k strideA]) Which is: 0.0050582886 [ FAILED ] _/matmul_test.matmul/pre_checkin_alpha_beta_zero_NaN_f16_rf16_rf16_rf16_rf32_r_NN_256_128_64_nnan_256_64_nnan_256_256_1, where GetParam() = { function: "matmul", name: "alpha_beta_zero_NaN", category: "pre_checkin", known_bug_platforms: "", alpha: -nan, beta: -nan, stride_a: 16384, stride_b: 8192, stride_c: 32768, stride_d: 32768, stride_e: 32768, user_allocated_workspace: 0, M: 256, N: 128, K: 64, lda: 256, ldb: 64, ldc: 256, ldd: 256, lde: 256, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: 0, a_type: f16_r, b_type: f16_r, c_type: f16_r, d_type: f16_r, compute_type: f32_r, scale_type: f32_r, initialization: "rand_int", gpu_arch: "", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: (5836 ms) [ RUN ] /matmul_test.matmul/pre_checkin_alpha_beta_zero_NaN_f16_rf16_rf16_rf16_rf32_r_NN_256_128_64_nnan_256_64_2_256_256_1 /opt/hipBLASLt/clients/gtest/../include/unit.hpp:208: Failure Expected equality of these values: float(hCPU[i + j size_t(lda) + k strideA])

jichangjichang commented 10 months ago

@idreamerhx hipblaslt currently only support gfx90a device. https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/README.md#hardware-requirements

ppanchad-amd commented 2 weeks ago

@idreamerhx Can you please test with the latest ROCm 6.1.2 to see if your issue still exists? If not, please close the ticket. Thanks!