Closed mehrdadh closed 2 months ago
from @Mousius, this would solve the issue. I tested it in ci_cortexm and it worked:
v4.0.0
tag and then cherry pick 245089501eef18e8b638865c5afd6cdf2d03386f
${CMSIS_PATH}/CMSIS-NN/Source/SoftmaxFunctions/arm_nn_softmax_common_s8.c
to your CMakeList.txt fileCONFIG_COMPILER_OPT="-DARM_MATH_AUTOVECTORIZE"
to your prj.confthis should build and run the demo correctly.
cc @gromero
@mehrdadh Nice you found the solution :-) Basically with -DARM_MATH_AUTOVECTORIZE
you're avoid the problematic code in
1) arm_nn_mat_mult_nt_t_s8.c
and 2) arm_softmax_s8.c
.
For the issue 1) I think that's fine if gcc is really autovectorizing the C code now being built (instead of the inline asm that caused the "impossible constraint" error), but for 2) issue, it's an ICE -- internal compiler error , so I think it points to a serious bug in the compiler and must be investigated.
I'm curious if there is any reason for selecting this particular tag and cherry-picking this particular commit?
@gromero the version is the latest version on the repo, and the commit is a workaround for the ICE: https://github.com/ARM-software/CMSIS-NN/commit/245089501eef18e8b638865c5afd6cdf2d03386f 😸
@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define -DARM_MATH_AUTOVECTORIZE
avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...
@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define
-DARM_MATH_AUTOVECTORIZE
avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...
Update on this, it's the use of -mfloat-abi=hard
, this leads me to think that the compiler is using up more of the registers which the asm
block would be using rather than passing in soft float mode 🤔
@Mousius haha ok, that explains all :-) well, not all exactly, I confess I'm still intrigued by the "impossible constraint" error. Of course the define
-DARM_MATH_AUTOVECTORIZE
avoids the inline asm in question, but I could not make sense why the constraint is impossible here. I'm wondering if it's due to a register allocation issue in this particular function where the inline asm is...Update on this, it's the use of
-mfloat-abi=hard
, this leads me to think that the compiler is using up more of the registers which theasm
block would be using rather than passing in soft float mode thinking
@Mousius yeah, interesting. Also the Te
constraint will force using just the even registers from r0
to r14
accordingly to this doc. But I have neither looked at the generated asm code for that function nor checked exactly why that constraint is in place for those machine instructions. I also realized that if r14
is removed from the clobber list, at least one more Te
constraint is allowed (does not throw the impossible constraint error), which also points to a lack of general purpose registers in the inline asm scope. I'm not sure if gcc autovectorizing is kicking in and doing better than this inline asm.
btw, the ICE is fixed in gcc / HEAD by now (checked commit 605d1297b91). So I guess it's just a matter of a new cut for gcc and Zephyr SDK picking it up.
ICE fix should be available on Zephyr SDK 0.16. See https://github.com/zephyrproject-rtos/sdk-ng/issues/625
What is the error?
How to reproduce? Use ci_cortexm docker image:
Environment Zephyr 3.2 Zephyr-SDK 0.15.2 CMSIS SHA: 51263182d16c92649a48144ba56c0945f9fce60e CMSIS NN SHA: v4.0.0