Closed eshoguli closed 2 months ago
Hi @eshoguli , By next Monday, 05 August, I will get a clear answer on when this new feature will be provided. Thanks
Tested on commit:
commit c5dd7753d0475ffec0f192f3181fe67a1d761680 (tag: v24.07, origin/main, origin/HEAD, main)
Author: Jenkins <bsgcomp@arm.com>
Date: Fri Jul 26 12:07:30 2024 +0000
Compute Library v24.07
How to easilly reproduce branch: es/aarch64/neon_gemm_u8i8_support/ example files:
build: scons arch=arm64-v8.2-a neon=1 opencl=0 openmp=0 cppthreads=1 os=macos data_layout_support=all build=native asserts=1 --jobs=8 --silent os=macos build=native fixed_format_kernels=True validation_tests=1 examples=1 debug=0
run: ./build/examples/neon_gemm_u8s8_s32_comparision
expected results: 120
for each result matrix item. Ectual value is 7560
. Note, please, if we update signed value -2
to 2
here: https://github.com/eshoguli/ComputeLibrary/blob/es/aarch64/neon_gemm_u8i8_support/examples/neon_gemm_u8s8_f32_comparision.cpp#L174, then results will be OK.
Hi @eshoguli
The following patch adds mixed sign support in GEMM and has already been merged to main.
I made some changes to your test neon_gemm_u8s8_f32_comparision.cpp
to also compute SGEMM and compare the output with GEMMLOWP. As you can see below the output is -12
in both cases.
root@hikey:~/tmp/user/github# LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./neon_gemm_u8s8_f32_comparision 3 3 3
src1 F32 [6, 16, 1, 1]:
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
src2 F32 [16, 6, 1, 1]:
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
q_src1 QASYMM8_SIGNED [6, 16, 1, 1]:
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
q_src2 QASYMM8_SIGNED [16, 6, 1, 1]:
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
Lowp GEMM output F32 [16, 16, 1, 1]:
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
SGEMM F32 [16, 16, 1, 1]:
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
In your test I just added the following code at the end to print the output of sgemm
////
236 NEGEMM fgemm{};
237
238 Tensor dst;
239 dst.allocator()->init(TensorInfo(TensorShape(16, 16, 1, 1), 1, DataType::F32));
240 fgemm.configure(&src1, &src2, nullptr, &dst, 1, 0);
241 dst.allocator()->allocate();
242 fgemm.run();
243
244
245 // Print sgemm output
246 std::cout << "SGEMM " << dst.info() << ":" << std::endl;
247 dst.print(std::cout);
248
249
250
251
252
253 return 0;
254 }
Validated: case with QASYMM8
+ QASYMM8_SIGNED
inputs and F32
output is supported https://review.mlplatform.org/ml/ComputeLibrary, thanks! Note, please, the fix has not yet been applied in https://github.com/ARM-software/ComputeLibrary
In accordance with documentation NEGEMMLowpMatrixMultiplyCore suports only limited combinations of
QSYMM8
andQASYMM8_SIGNED
precisions on inputs:But we need to support
QSYMM8
onsrc1
andQASYMM8_SIGNED
onsrc2
. Why this combinations is not supported? Can I useshift
/zero-point
in the secondNEQuantizationLayer
to resolve the issue?Are you going to support
QSYMM8
onsrc1
andQASYMM8_SIGNED
onsrc2
in the future? Thanks![UPD] Note, please, I modfied examples to check
QSYMM8
andQASYMM8_SIGNED
on inputs support. You can easily explore source code here: https://github.com/eshoguli/ComputeLibrary/commit/28c57d4f8de6df37d8edd031362160d76fda079e. There are no any validation exceptions forQSYMM8
andQASYMM8_SIGNED
inputs but output results are not correct.Tensors logging for
QSYMM8
andQASYMM8_SIGNED
with incorrect results examples/neon_gemm_u8s8_s32.cpp:Tensors logging for
QASYMM8_SIGNED
andQASYMM8_SIGNED
with correct results as reference examples/neon_gemm_s8s8_s32.cpp: