ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

Unable to get the correct output of NEGEMM with both C and beta #1024

Closed Sandglass-vic closed 1 year ago

Sandglass-vic commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.11 Build options: {'embed_kernels': '1', 'os': 'linux', 'opencl': '0', 'neon': '1', 'build_dir': 'arm64-v8.2-a-neon', 'asserts': '0', 'arch': 'arm64-v8.2-a', 'Werror': '1'} Git hash=unknown

Platform: armv8.2-a + neon + fp16

Operating System: CentOS Linux 7 (AltArch)

Problem description: I modified examples/neon_sgemm.cpp to check the correctness of alpha A B + beta C given any alpha and any beta with a naive implementation of sgemm as my reference: ``for (int j = 0; j < N; ++j) for (int i = 0; i < M; ++i) { float temp = 0.; for (int l = 0; l < K; ++l) temp += a[i + l M] b[l + j K]; c[i + j M] = alpha (c[i + j * M] + temp); }``

The configure function is sgemm.configure(&src0, &src1, &src2, &dst, alpha, beta);

However, the result turns out to be alpha (A B + C) and changing beta doesn't influence the result. Is it a bug?

morgolock commented 1 year ago

Hi @Sandglass-vic

Please take a look at our reference implementation for gemm in https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/reference/GEMM.cpp#L80

Hope this helps

Sandglass-vic commented 1 year ago

I think it has nothing to do with the implementation of reference gemm. I set M=N=K=4, alpha=0 and beta=1 using the following code:

    // Configure function
    sgemm.configure(&src0, &src1, &src2, &dst, 0, 1);

    // Allocate all the images
    src0.allocator()->allocate();
    src1.allocator()->allocate();
    src2.allocator()->allocate();
    dst.allocator()->allocate();

    for (int i = 0; i < M; ++i)
      for (int j = 0; j < K; ++j)
        *reinterpret_cast<float*>(
            src0.buffer() +
            src0.info()->offset_element_in_bytes(Coordinates(j, i))) = 0;
    for (int i = 0; i < K; ++i)
      for (int j = 0; j < N; ++j)
        *reinterpret_cast<float*>(
            src1.buffer() +
            src1.info()->offset_element_in_bytes(Coordinates(j, i))) = 0;
    for (int i = 0; i < M; ++i)
      for (int j = 0; j < N; ++j)
        *reinterpret_cast<float*>(
            src2.buffer() +
            src2.info()->offset_element_in_bytes(Coordinates(j, i))) = 1;
    // Dummy run for CLTuner
    sgemm.run();
    printf("dst:\n");
    dst.print(std::cout);

The output was

dst:
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0

which was expected to be all ones. And I changed alpha to 1 and beta to zero,

    // Configure function
    sgemm.configure(&src0, &src1, &src2, &dst, 1, 0);

The output was

dst:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
morgolock commented 1 year ago

Hi @Sandglass-vic

Could you please try setting GEMMInfo like this when you call configure?

+        GEMMInfo gemm_info(false,false,false);
+        sgemm.configure(&src0, &src1, &src2, &dst, alpha, beta, gemm_info);

Hope this helps.

Sandglass-vic commented 1 year ago

Yes, it helps. Thanks for your help! :)