Open chauncyyoung opened 4 years ago
Hi chauncyyoung
Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.
Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?
Hi chauncyyoung
Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.
Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?
Thank you for your reply. I just tried to ignore the case of ref_gemm by adding "//" before it just like follows:
// return ref_gemm
return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
force_jit_nocopy_gemm);
} else {
return ref_gemm<float>(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}
And then the errors happened when built. Other source codes hadn't been modified. So I don't know whether I should modify the cmake files. I also found this case has used kernel_table[isTransA][isTransB][hasBias][beta_idx(beta)] = new xbyak_gemm(isTransA, isTransB, beta, hasBias); in ./jit_avx512_common_gemm_f32.cpp while it used AVX512 in xbyak_gemm such as 'vgatherqps'(I'm not sure because I'm a newcomer...) If it's true, would it be transfered by xbyak to the Arm Assembly? Thank you again for helping me!!! :)
Hi chauncyyoung-san
Thank you for trying dnnl_aarch64.
I tried your procedure.
#ifdef __ARM_ARCH
-> #ifdef __ARM_ARCH_
)cmake
and make
#ifndef __ARM_ARCH_
if (mayiuse(avx512_mic)) {
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
float *dummy_ao = NULL;
float *dummy_bo = NULL;
return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
force_jit_nocopy_gemm);
} else
#endif // __ARM_ARCH
{
return ref_gemm<float>(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}
All binary are successfully built in my environment,
but ./test_gemm_f32
becomes SEGV in 14-th test pattern.
I'll try to bug fix.
[==========] Running 21 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 21 tests from TestGEMM_fp32/gemm_test
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/0
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/0 (0 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/1
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/1 (0 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/2
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/2 (0 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/3
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/3 (0 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/4
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/4 (3296 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/5
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/5 (9 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/6
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/6 (11 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/7
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/7 (12 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/8
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/8 (22 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/9
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/9 (9 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/10
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/10 (8 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/11
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/11 (6 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/12
[ OK ] TestGEMM_fp32/gemm_test.TestGEMM/12 (1 ms)
[ RUN ] TestGEMM_fp32/gemm_test.TestGEMM/13
zsh: segmentation fault (core dumped) ./test_gemm_f32
Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.
Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.
Thank you for your reply, I think it may be influenced by version of dnnl_aarch64. I used the branch of release_base_0.19. I'll try the latest version with QEMU later.
Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.
I'm also not sure about xbyak whether it translates x86 assembler to aarch64 assembler?
Another question occurs in gemm.cpp as follows:
return ref_gemm<float>(
transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
if (mayiuse(avx512_mic)) {
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
float *dummy_ao = NULL;
float *dummy_bo = NULL;
return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
force_jit_nocopy_gemm);
} else {
return ref_gemm<float>(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}
}
As ref_gemm is before jit_avx512_common_gemm_f32, does ref_gemm has a higher priority? Or in another words, does ref_gemm has a better performance than jit_avx512_common_gemm_f32?
chauncyyoung-san
"release_base_0.19" does not output any JIT-ed code except jit_uni_reorder.cpp so that ref_gemm is always used for AArch64.
Please use "release_base_0.21" to try various JIT-ed code on AArch64. This version generates some JIT-ed code directly by using Xbyak_aarch64. It is implemented src/cpu/jitsve*.cpp. And this version also outputs some JIT-ed code indirectly by using Xbyak_translator_aarch64, which translates x86 JIT-ed instructions to AArch64 instructions one by one.
If you want to try JIT-ed gemm, replace
#ifndef __ARM_ARCH
of https://github.com/fujitsu/dnnl_aarch64/blob/release_base_0.21/src/cpu/gemm/gemm.cpp#L123 to
#ifdef __ARM_ARCH
Currently, "release_base_0.21" has some bugs in JIT-ed gemm, it is disabled by default.
I have two questions below:
First, when I used the extended_sgemm, I found it went into __ARM_ARCH acquiescently. But I can not find the place that it was defined. Could you help me solve this problem?
Second, I tried to use jit_avx512_common_gemm_f32 but was failed because of a undefined references ocuured in libmkldnn. Should I adjust other parameters to run it?
echo MKLROOT=$MKLROOT
)ifdef __ARM_ARCH
// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
//else // #ifdef __ARM_ARCH
if (mayiuse(avx512_mic)) {
printf("enter 1\n");
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
printf("enter 2\n");
float dummy_ao = NULL;
float dummy_bo = NULL;
endif // #ifdef __ARM_ARCH