fujitsu / dnnl_aarch64

Apache License 2.0
50 stars 12 forks source link

Questions about __ARM_ARCH #16

Open chauncyyoung opened 4 years ago

chauncyyoung commented 4 years ago

I have two questions below:

First, when I used the extended_sgemm, I found it went into __ARM_ARCH acquiescently. But I can not find the place that it was defined. Could you help me solve this problem?

Second, I tried to use jit_avx512_common_gemm_f32 but was failed because of a undefined references ocuured in libmkldnn. Should I adjust other parameters to run it?

ifdef __ARM_ARCH

// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); //else // #ifdef __ARM_ARCH if (mayiuse(avx512_mic)) { printf("enter 1\n"); return jit_avx512_common_gemm_f32(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); } else if (mayiuse(avx)) { printf("enter 2\n"); float dummy_ao = NULL; float dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    printf("enter 3\n");
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

endif // #ifdef __ARM_ARCH

Takumi-Honda commented 4 years ago

Hi chauncyyoung

Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?

chauncyyoung commented 4 years ago

Hi chauncyyoung

Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?

Thank you for your reply. I just tried to ignore the case of ref_gemm by adding "//" before it just like follows:

ifdef __ARM_ARCH

// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); //#else // #ifdef __ARM_ARCH if (mayiuse(avx512_mic)) { return jit_avx512_common_gemm_f32(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); } else if (mayiuse(avx)) { float dummy_ao = NULL; float dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

endif // #ifdef __ARM_ARCH

And then the errors happened when built. Other source codes hadn't been modified. So I don't know whether I should modify the cmake files. I also found this case has used kernel_table[isTransA][isTransB][hasBias][beta_idx(beta)] = new xbyak_gemm(isTransA, isTransB, beta, hasBias); in ./jit_avx512_common_gemm_f32.cpp while it used AVX512 in xbyak_gemm such as 'vgatherqps'(I'm not sure because I'm a newcomer...) If it's true, would it be transfered by xbyak to the Arm Assembly? Thank you again for helping me!!! :)

kawakami-k commented 4 years ago

Hi chauncyyoung-san

Thank you for trying dnnl_aarch64.

I tried your procedure.

#ifndef __ARM_ARCH_
    if (mayiuse(avx512_mic)) {
        return jit_avx512_common_gemm_f32(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    } else if (mayiuse(avx)) {
        float *dummy_ao = NULL;
        float *dummy_bo = NULL;

        return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
                A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
                force_jit_nocopy_gemm);
    } else
#endif // __ARM_ARCH                                                                                                                                                                                                                                                                                            
    {
        return ref_gemm<float>(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    }

All binary are successfully built in my environment, but ./test_gemm_f32 becomes SEGV in 14-th test pattern. I'll try to bug fix.

[==========] Running 21 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 21 tests from TestGEMM_fp32/gemm_test
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/0
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/0 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/1
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/1 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/2
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/2 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/3
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/3 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/4
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/4 (3296 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/5
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/5 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/6
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/6 (11 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/7
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/7 (12 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/8
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/8 (22 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/9
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/9 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/10
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/10 (8 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/11
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/11 (6 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/12
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/12 (1 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/13
zsh: segmentation fault (core dumped)  ./test_gemm_f32
kawakami-k commented 4 years ago

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

chauncyyoung commented 4 years ago

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

Thank you for your reply, I think it may be influenced by version of dnnl_aarch64. I used the branch of release_base_0.19. I'll try the latest version with QEMU later.

chauncyyoung commented 4 years ago

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

I'm also not sure about xbyak whether it translates x86 assembler to aarch64 assembler?

Another question occurs in gemm.cpp as follows:

ifdef __ARM_ARCH

return ref_gemm<float>(
        transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);

else // #ifdef __ARM_ARCH

if (mayiuse(avx512_mic)) {
    return jit_avx512_common_gemm_f32(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
    float *dummy_ao = NULL;
    float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

endif // #ifdef __ARM_ARCH

}

As ref_gemm is before jit_avx512_common_gemm_f32, does ref_gemm has a higher priority? Or in another words, does ref_gemm has a better performance than jit_avx512_common_gemm_f32?

kawakami-k commented 4 years ago

chauncyyoung-san

"release_base_0.19" does not output any JIT-ed code except jit_uni_reorder.cpp so that ref_gemm is always used for AArch64.

Please use "release_base_0.21" to try various JIT-ed code on AArch64. This version generates some JIT-ed code directly by using Xbyak_aarch64. It is implemented src/cpu/jitsve*.cpp. And this version also outputs some JIT-ed code indirectly by using Xbyak_translator_aarch64, which translates x86 JIT-ed instructions to AArch64 instructions one by one.

If you want to try JIT-ed gemm, replace

#ifndef __ARM_ARCH

of https://github.com/fujitsu/dnnl_aarch64/blob/release_base_0.21/src/cpu/gemm/gemm.cpp#L123 to

#ifdef __ARM_ARCH

Currently, "release_base_0.21" has some bugs in JIT-ed gemm, it is disabled by default.