Open cujomalainey opened 2 years ago
Note -O1 works fine
Was able to bisect down the optimizations 2 adds, this also repros the bug. build_flags = -O1 -fcode-hoisting -finline-functions -finline-small-functions
I'm guessing this is likely a bug in GCC now that I look at those flags, its probably not repurposing a register properly for an inlined function.
What is interesting is that by default the system is compiling with -Os
which is -O2
minus a few options (not ones above) and includes -finline-functions
but "tunes for size" so im guessing whatever is getting inlined and breaking isn't tuned for size.
reference gcc manual
thats unfortunate but....not sure what it could be. what if you de-inline any functions, does that help?
could you confirm if the issue reproducible with Arduino IDE by changing the -Os to -O2 option here (there are 3 occurrences)
@ladyada it appears to be the transfer function, forcing that to not be inlined fixes it
diff --git a/Adafruit_SPIDevice.cpp b/Adafruit_SPIDevice.cpp
index 44a8f55..db3e993 100644
--- a/Adafruit_SPIDevice.cpp
+++ b/Adafruit_SPIDevice.cpp
@@ -119,7 +119,7 @@ bool Adafruit_SPIDevice::begin(void) {
* @param buffer The buffer to send and receive at the same time
* @param len The number of bytes to transfer
*/
-void Adafruit_SPIDevice::transfer(uint8_t *buffer, size_t len) {
+void __attribute__ ((noinline)) Adafruit_SPIDevice::transfer(uint8_t *buffer, size_t len) {
if (_spi) {
// hardware SPI is easy
Dug a bit deeper through the ASM, looks like its specifically the calls to delayMicroseconds
(defined in cores/arduino/delay.h from the BSP) in transfer that are causing the issue when inlined in the transfer function.
/**
* \brief Pauses the program for the amount of time (in microseconds) specified as parameter.
*
* \param dwUs the number of microseconds to pause (uint32_t)
*/
#if defined(__SAMD51__)
extern void delayMicroseconds( unsigned int );
#else
static __inline__ void delayMicroseconds( unsigned int ) __attribute__((always_inline, unused)) ;
static __inline__ void delayMicroseconds( unsigned int usec )
{
if ( usec == 0 )
{
return ;
}
/*
* The following loop:
*
* for (; ul; ul--) {
* __asm__ volatile("");
* }
*
* produce the following assembly code:
*
* loop:
* subs r3, #1 // 1 Core cycle
* bne.n loop // 1 Core cycle + 1 if branch is taken
*/
// VARIANT_MCK / 1000000 == cycles needed to delay 1uS
// 3 == cycles used in a loop
uint32_t n = usec * (VARIANT_MCK / 1000000) / 3;
__asm__ __volatile__(
"1: \n"
" sub %0, #1 \n" // substract 1 from %0 (n)
" bne 1b \n" // if result is not 0 jump to 1
: "+r" (n) // '%0' is n variable with RW constraints
: // no input
: // no clobber
);
// https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
// https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
}
#endif
This is the from the .s dump @ line 121 (test build config changed the line to this from 462 in original report)
@ 92 "/Users/curtismalainey/.platformio/packages/framework-arduino-samd-adafruit/cores/arduino/delay.h" 1
1:
sub r10, #1
bne 1b
@ 0 "" 2
Note there is another call right before this one and its fine, but if the parser is going from reverse then this is not-helpful.
@ 92 "/Users/curtismalainey/.platformio/packages/framework-arduino-samd-adafruit/cores/arduino/delay.h" 1
1:
sub r2, #1
bne 1b
@ 0 "" 2
Further debugging, looking into the ARMv6 ISA posted here. I think I figured out the error
Section A6.7.65 (pg 164) Only 3 bits available to express the register Rdn. So assuming a linear addressing, you cannot specify anything above r7 for the sub instruction.
So back to blaming gcc unfortunately.
erk - im ok with forcing a non-inline attribute, can we by chance #ifdef that for the gcc version?
Sure, I can test that out, I'll also see if I can bisect the affected range since platformio makes adjusting toolchains easy
ok please note we do not have CI or testing for platformio so basically just make sure it passes the actions CI for arduino :)
Latest gcc on platformio is also broken
arm-none-eabi-g++ (xPack GNU Arm Embedded GCC x86_64) 10.3.1 20210824 (release)
I'll try and build gcc from source later this week on GCC 11 and 12 and see if its fixed there, otherwise i'll report upstream
Unable to repro on arm-none-eabi-g++ (GNU Toolchain for the Arm Architecture 11.2-2022.02 (arm-11.14)) 11.2.1 20220111
which i fetched straight from the arm website
So it looks like the gcc toolchain needs to be updated for the board, i don't even see that in homebrew formulas yet so i don't expect it to be for a bit.
Thankfully my project is not blocked by this (my project was slow because I was doing way too much float math on an itsybitsy_m0, which i have removed).
Note I checked the asm dump in my test and the second call was using r6
instead of r10
is the toolchain v defined in platformio - e.g. do you see the same issue in arduino ide?
is the toolchain v defined in platformio - e.g. do you see the same issue in arduino ide?
Will test when I have time, but I believe it inherits the same toolchains from the Arduino bsp as it tries to reuse as much as possible
Arduino board: itsy bitsy m0
Arduino IDE version (found in Arduino -> About Arduino menu): platformio 6.0.2
List the steps to reproduce the problem below (if possible attach a sketch or copy the sketch code in too):
Use the following build config and try and use the BusIO library
main.cpp
build output (dirty build to reduce length)