[Bug] CMSIS-NN BYOC fails with Zephyr 3.2

mehrdadh commented 1 year ago

What is the error?

/opt/arm/ethosu/cmsis/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c: In function 'arm_exp_on_negative_values_mve_32x4':
/opt/arm/ethosu/cmsis/CMSIS-NN/Source/SoftmaxFunctions/arm_softmax_s8.c:74:1: internal compiler error: in trunc_int_for_mode, at explow.cc:59
   74 | }
      | ^
0x169d629 internal_error(char const*, ...)
        ???:0
0x667b60 fancy_abort(char const*, int, char const*)
        ???:0
0x895993 trunc_int_for_mode(long, machine_mode)
        ???:0
0x8959b8 trunc_int_for_mode(poly_int<1u, long>, machine_mode)
        ???:0
0x88a1c8 gen_int_mode(poly_int<1u, long>, machine_mode)
        ???:0

...

at_mult_nt_t_s8.c
In file included from /opt/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_compiler.h:54,
                 from /opt/arm/ethosu/cmsis/CMSIS-NN/Include/arm_nn_math_types.h:90,
                 from /opt/arm/ethosu/cmsis/CMSIS-NN/Include/arm_nnsupportfunctions.h:33,
                 from /opt/arm/ethosu/cmsis/CMSIS-NN/Source/NNSupportFunctions/arm_nn_mat_mult_nt_t_s8.c:31:
/opt/arm/ethosu/cmsis/CMSIS-NN/Source/NNSupportFunctions/arm_nn_mat_mult_nt_t_s8.c: In function 'arm_nn_mat_mult_nt_t_s8':
/opt/zephyrproject/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:41:50: error: 'asm' operand has impossible constraints
   41 |   #define __ASM                                  __asm
      |                                                  ^~~~~
/opt/arm/ethosu/cmsis/CMSIS-NN/Source/NNSupportFunctions/arm_nn_mat_mult_nt_t_s8.c:102:13: note: in expansion of macro '__ASM'
  102 |             __ASM volatile("   vldrb.8         q0, [%[col]], #16     \n"
      |             ^~~~~
ninja: build stopped: subcommand failed.

How to reproduce? Use ci_cortexm docker image:

cd apps/microtvm/zephyr_cmsisnn
./run_demo.sh

Environment Zephyr 3.2 Zephyr-SDK 0.15.2 CMSIS SHA: 51263182d16c92649a48144ba56c0945f9fce60e CMSIS NN SHA: v4.0.0

mehrdadh commented 1 year ago

from @Mousius, this would solve the issue. I tested it in ci_cortexm and it worked:

checkout CMSIS-NN v4.0.0 tag and then cherry pick 245089501eef18e8b638865c5afd6cdf2d03386f
Add ${CMSIS_PATH}/CMSIS-NN/Source/SoftmaxFunctions/arm_nn_softmax_common_s8.c to your CMakeList.txt file
add CONFIG_COMPILER_OPT="-DARM_MATH_AUTOVECTORIZE" to your prj.conf

this should build and run the demo correctly.

mehrdadh commented 1 year ago

cc @gromero

gromero commented 1 year ago

@mehrdadh Nice you found the solution :-) Basically with -DARM_MATH_AUTOVECTORIZE you're avoid the problematic code in 1) arm_nn_mat_mult_nt_t_s8.c and 2) arm_softmax_s8.c.

For the issue 1) I think that's fine if gcc is really autovectorizing the C code now being built (instead of the inline asm that caused the "impossible constraint" error), but for 2) issue, it's an ICE -- internal compiler error , so I think it points to a serious bug in the compiler and must be investigated.

I'm curious if there is any reason for selecting this particular tag and cherry-picking this particular commit?

Mousius commented 1 year ago

@gromero the version is the latest version on the repo, and the commit is a workaround for the ICE: https://github.com/ARM-software/CMSIS-NN/commit/245089501eef18e8b638865c5afd6cdf2d03386f 😸

gromero commented 1 year ago

@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define -DARM_MATH_AUTOVECTORIZE avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...

Mousius commented 1 year ago

@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define -DARM_MATH_AUTOVECTORIZE avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...

Update on this, it's the use of -mfloat-abi=hard, this leads me to think that the compiler is using up more of the registers which the asm block would be using rather than passing in soft float mode 🤔

gromero commented 1 year ago

@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define -DARM_MATH_AUTOVECTORIZE avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...

Update on this, it's the use of -mfloat-abi=hard, this leads me to think that the compiler is using up more of the registers which the asm block would be using rather than passing in soft float mode thinking

@Mousius yeah, interesting. Also the Te constraint will force using just the even registers from r0 to r14 accordingly to this doc. But I have neither looked at the generated asm code for that function nor checked exactly why that constraint is in place for those machine instructions. I also realized that if r14 is removed from the clobber list, at least one more Te constraint is allowed (does not throw the impossible constraint error), which also points to a lack of general purpose registers in the inline asm scope. I'm not sure if gcc autovectorizing is kicking in and doing better than this inline asm.

gromero commented 1 year ago

btw, the ICE is fixed in gcc / HEAD by now (checked commit 605d1297b91). So I guess it's just a matter of a new cut for gcc and Zephyr SDK picking it up.

gromero commented 1 year ago

ICE fix should be available on Zephyr SDK 0.16. See https://github.com/zephyrproject-rtos/sdk-ng/issues/625

apache / tvm

[Bug] CMSIS-NN BYOC fails with Zephyr 3.2 #13856