adafruit / Adafruit_BusIO

Arduino library for I2C & SPI abstractions
MIT License
290 stars 265 forks source link

-O2 breaks this lib on atmel M0 #102

Open cujomalainey opened 2 years ago

cujomalainey commented 2 years ago

Use the following build config and try and use the BusIO library

[env:itsy]
platform = atmelsam
board = adafruit_itsybitsy_m0
framework = arduino
build_flags = -O2
build_unflags = -Os
lib_deps = 6214

main.cpp

#include "Wire.h"
#include <Adafruit_SPIDevice.h>

void setup() {}
void loop() {}

build output (dirty build to reduce length)

Processing itsy (platform: atmelsam; board: adafruit_itsybitsy_m0; framework: arduino)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Verbose mode can be enabled via `-v, --verbose` option
CONFIGURATION: https://docs.platformio.org/page/boards/atmelsam/adafruit_itsybitsy_m0.html
PLATFORM: Atmel SAM (7.1.0+sha.ec05bdd) > Adafruit ItsyBitsy M0
HARDWARE: SAMD21G18A 48MHz, 32KB RAM, 256KB Flash
DEBUG: Current (atmel-ice) External (atmel-ice, blackmagic, jlink)
PACKAGES:
 - framework-arduino-samd-adafruit @ 1.7.5
 - framework-cmsis @ 2.50400.181126 (5.4.0)
 - framework-cmsis-atmel @ 1.2.2
 - toolchain-gccarmnoneeabi @ 1.90301.200702 (9.3.1)
LDF: Library Dependency Finder -> https://bit.ly/configure-pio-ldf
LDF Modes: Finder ~ chain, Compatibility ~ soft
Found 12 compatible libraries
Scanning dependencies...
Dependency Graph
|-- Adafruit BusIO @ 1.12.0
|   |-- SPI @ 1.0
|   |   |-- Adafruit Zero DMA Library @ 1.1.0
|   |   |   |-- Adafruit TinyUSB Library @ 1.4.3
|   |   |-- Adafruit TinyUSB Library @ 1.4.3
|   |-- Wire @ 1.0
|   |   |-- Adafruit TinyUSB Library @ 1.4.3
|-- Wire @ 1.0
|   |-- Adafruit TinyUSB Library @ 1.4.3
Building in release mode
Compiling .pio/build/itsy/src/main.cpp.o
Archiving .pio/build/itsy/lib716/libAdafruit_ZeroDMA.a
Indexing .pio/build/itsy/lib716/libAdafruit_ZeroDMA.a
Archiving .pio/build/itsy/lib0f0/libSPI.a
Indexing .pio/build/itsy/lib0f0/libSPI.a
Compiling .pio/build/itsy/lib6bd/Wire/Wire.cpp.o
Compiling .pio/build/itsy/libc16/Adafruit BusIO/Adafruit_BusIO_Register.cpp.o
Compiling .pio/build/itsy/libc16/Adafruit BusIO/Adafruit_I2CDevice.cpp.o
Compiling .pio/build/itsy/libc16/Adafruit BusIO/Adafruit_SPIDevice.cpp.o
/var/folders/wy/yddn63455wd2vk7m62shqd780000gn/T//ccEbUdN3.s: Assembler messages:
/var/folders/wy/yddn63455wd2vk7m62shqd780000gn/T//ccEbUdN3.s:462: Error: lo register required -- `sub r10,#1'
Compiling .pio/build/itsy/FrameworkArduino/WInterrupts.c.o
Compiling .pio/build/itsy/FrameworkArduino/WMath.cpp.o
Compiling .pio/build/itsy/FrameworkArduino/WString.cpp.o
Compiling .pio/build/itsy/FrameworkArduino/abi.cpp.o
Compiling .pio/build/itsy/FrameworkArduino/avr/dtostrf.c.o
*** [.pio/build/itsy/libc16/Adafruit BusIO/Adafruit_SPIDevice.cpp.o] Error 1
cujomalainey commented 2 years ago

Note -O1 works fine

cujomalainey commented 2 years ago

Was able to bisect down the optimizations 2 adds, this also repros the bug. build_flags = -O1 -fcode-hoisting -finline-functions -finline-small-functions

I'm guessing this is likely a bug in GCC now that I look at those flags, its probably not repurposing a register properly for an inlined function.

What is interesting is that by default the system is compiling with -Os which is -O2 minus a few options (not ones above) and includes -finline-functions but "tunes for size" so im guessing whatever is getting inlined and breaking isn't tuned for size.

reference gcc manual

ladyada commented 2 years ago

thats unfortunate but....not sure what it could be. what if you de-inline any functions, does that help?

hathach commented 2 years ago

could you confirm if the issue reproducible with Arduino IDE by changing the -Os to -O2 option here (there are 3 occurrences)

https://github.com/adafruit/ArduinoCore-samd/blob/bd2a9cdbe7433ec88701591c49b1224c8686e940/platform.txt#L36-L42

cujomalainey commented 2 years ago

@ladyada it appears to be the transfer function, forcing that to not be inlined fixes it

diff --git a/Adafruit_SPIDevice.cpp b/Adafruit_SPIDevice.cpp
index 44a8f55..db3e993 100644
--- a/Adafruit_SPIDevice.cpp
+++ b/Adafruit_SPIDevice.cpp
@@ -119,7 +119,7 @@ bool Adafruit_SPIDevice::begin(void) {
  *    @param  buffer The buffer to send and receive at the same time
  *    @param  len    The number of bytes to transfer
  */
-void Adafruit_SPIDevice::transfer(uint8_t *buffer, size_t len) {
+void __attribute__ ((noinline)) Adafruit_SPIDevice::transfer(uint8_t *buffer, size_t len) {
   if (_spi) {
     // hardware SPI is easy
cujomalainey commented 2 years ago

Dug a bit deeper through the ASM, looks like its specifically the calls to delayMicroseconds (defined in cores/arduino/delay.h from the BSP) in transfer that are causing the issue when inlined in the transfer function.

/**
 * \brief Pauses the program for the amount of time (in microseconds) specified as parameter.
 *
 * \param dwUs the number of microseconds to pause (uint32_t)
 */
#if defined(__SAMD51__)
extern void delayMicroseconds( unsigned int );
#else
static __inline__ void delayMicroseconds( unsigned int ) __attribute__((always_inline, unused)) ;
static __inline__ void delayMicroseconds( unsigned int usec )
{
  if ( usec == 0 )
  {
    return ;
  }
  /*
   *  The following loop:
   *
   *    for (; ul; ul--) {
   *      __asm__ volatile("");
   *    }
   *
   *  produce the following assembly code:
   *
   *    loop:
   *      subs r3, #1        // 1 Core cycle
   *      bne.n loop         // 1 Core cycle + 1 if branch is taken
   */

  // VARIANT_MCK / 1000000 == cycles needed to delay 1uS
  //                     3 == cycles used in a loop
  uint32_t n = usec * (VARIANT_MCK / 1000000) / 3;

  __asm__ __volatile__(
    "1:              \n"
    "   sub %0, #1   \n" // substract 1 from %0 (n)
    "   bne 1b       \n" // if result is not 0 jump to 1
    : "+r" (n)           // '%0' is n variable with RW constraints
    :                    // no input
    :                    // no clobber
  );
  // https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
  // https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
}
#endif

This is the from the .s dump @ line 121 (test build config changed the line to this from 462 in original report)

@ 92 "/Users/curtismalainey/.platformio/packages/framework-arduino-samd-adafruit/cores/arduino/delay.h" 1
        1:
   sub r10, #1
   bne 1b

@ 0 "" 2

Note there is another call right before this one and its fine, but if the parser is going from reverse then this is not-helpful.

@ 92 "/Users/curtismalainey/.platformio/packages/framework-arduino-samd-adafruit/cores/arduino/delay.h" 1
        1:
   sub r2, #1
   bne 1b

@ 0 "" 2

Further debugging, looking into the ARMv6 ISA posted here. I think I figured out the error

Section A6.7.65 (pg 164) Only 3 bits available to express the register Rdn. So assuming a linear addressing, you cannot specify anything above r7 for the sub instruction.

So back to blaming gcc unfortunately.

ladyada commented 2 years ago

erk - im ok with forcing a non-inline attribute, can we by chance #ifdef that for the gcc version?

cujomalainey commented 2 years ago

Sure, I can test that out, I'll also see if I can bisect the affected range since platformio makes adjusting toolchains easy

ladyada commented 2 years ago

ok please note we do not have CI or testing for platformio so basically just make sure it passes the actions CI for arduino :)

cujomalainey commented 2 years ago

Latest gcc on platformio is also broken

arm-none-eabi-g++ (xPack GNU Arm Embedded GCC x86_64) 10.3.1 20210824 (release)

I'll try and build gcc from source later this week on GCC 11 and 12 and see if its fixed there, otherwise i'll report upstream

cujomalainey commented 2 years ago

Unable to repro on arm-none-eabi-g++ (GNU Toolchain for the Arm Architecture 11.2-2022.02 (arm-11.14)) 11.2.1 20220111 which i fetched straight from the arm website

So it looks like the gcc toolchain needs to be updated for the board, i don't even see that in homebrew formulas yet so i don't expect it to be for a bit.

Thankfully my project is not blocked by this (my project was slow because I was doing way too much float math on an itsybitsy_m0, which i have removed).

cujomalainey commented 2 years ago

Note I checked the asm dump in my test and the second call was using r6 instead of r10

ladyada commented 2 years ago

is the toolchain v defined in platformio - e.g. do you see the same issue in arduino ide?

cujomalainey commented 2 years ago

is the toolchain v defined in platformio - e.g. do you see the same issue in arduino ide?

Will test when I have time, but I believe it inherits the same toolchains from the Arduino bsp as it tries to reuse as much as possible