ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Wrong result of NEGEMMLowpMatrixMultiplyCore with two QASYMM8_SIGNED input #992

Closed zjersey closed 1 year ago

zjersey commented 1 year ago

My need is to multiply two int8 tensors ([-127, 127]) to get the result S32 tensor (signed int32).

I run the codes below:

q_src1.allocator()->init(TensorInfo(TensorShape(2, 2), 1, DataType::QASYMM8_SIGNED, src1_qinfo));  
q_src2.allocator()->init(TensorInfo(TensorShape(2, 2), 1, DataType::QASYMM8_SIGNED, src2_qinfo)); 
q_res.allocator()->init(TensorInfo(TensorShape(2, 2), 1, DataType::S32));
NEGEMMLowpMatrixMultiplyCore qgemm;
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res);
// I manually set the values of q_src1 and q_src2, and allocate the three tensors.

qgemm.run();

printf("q_src1.print(): \n");
q_src1.print(std::cout);
printf("q_src2.print(): \n");
q_src2.print(std::cout);
printf("q_res.print(): \n");
q_res.print(std::cout);

The result is:

q_src1.print(): 
 -2  -3 
124  97 

q_src2.print(): 
 4 -110 
64  -87 

q_res.print(): 
 75033  22979 
117419  35901

How can I get the correct signed int32 results?

morgolock commented 1 year ago

Hi @zjersey

You have to call the allocate() method after configure()

Please take a look at the following example: https://github.com/ARM-software/ComputeLibrary/blob/main/examples/neon_gemm_qasymm8.cpp#L241

Hope this helps.