ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
49 stars 80 forks source link

Add Navi32 TN_HHS/NN_HHS Kernels #1133

Closed wenchuanchen closed 1 week ago

wenchuanchen commented 1 week ago

https://ontrack-internal.amd.com/browse/SWDEV-480801 https://ontrack-internal.amd.com/browse/SWDEV-480799

wenchuanchen commented 1 week ago

test.log tox.log

jichangjichang commented 1 week ago

There are 7 failed items in your test result. [ FAILED ] 7 tests, listed below: [ FAILED ] _/matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1024_1024_1024_1_1024_1024_0_1024_1024_1_SAV, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 0, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1024_1024_1024_1_1024_1024_2_1024_1024_1, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 2, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1024_1024_1024_1_1024_1024_2_1024_1024_1_SAV, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 2, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1031_1031_1031_1_1031_1031_0_1031_1031_1, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 0, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1031_1031_1031_1_1031_1031_0_1031_1031_1_SAV, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 0, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1031_1031_1031_1_1031_1031_2_1031_1031_1, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 2, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, groupedgemm: 0, threads: 0, streams: 0, devices: [ FAILED ] /matmul_test.matmul/pre_checkin_matmul_gemm_i8_dst_i8_1xxx_i8_ri8_ri8_ri8_ri32_r_relu_TT_1031_1031_1031_1_1031_1031_2_1031_1031_1_SAV, where GetParam() = { function: "matmul", name: "matmul_gemm_i8_dst_i8_1xxx", category: "pre_checkin", known_bug_platforms: "", alpha: 1, beta: 2, stride_a: 0xc895c8, stride_b: 0xc896c8, stride_c: 0xc897c8, stride_d: 0xc898c8, stride_e: 0xc899c8, user_allocated_workspace: 0, M: 0xc89ad0, N: 0xc89bd0, K: 0xc89cd0, lda: 0xc89dd0, ldb: 0xc89ed0, ldc: 0xc89fd0, ldd: 0xc8a0d0, lde: 0xc8a1d0, batch_count: 1, iters: 10, cold_iters: 2, algo: 0, solution_index: -1, requested_solution_num: 1, a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: i32_r, compute_input_typeA: non-supported type, compute_input_typeB: non-supported type, scale_type: i32_r, initialization: "rand_int", gpu_arch: "1[1-2]\d{2}", pad: 4096, grouped_gemm: 0, threads: 0, streams: 0, devices:

7 FAILED TESTS hipBLASLt version: 1000

wenchuanchen commented 1 week ago

That 7 failed cases in hipblaslt-test are irrelevant to this commit