esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.06k stars 13.33k forks source link

Add option to force I2S routines into IRAM #9115

Open kleinesfilmroellchen opened 7 months ago

kleinesfilmroellchen commented 7 months ago

Basic Infos

Platform

Settings in IDE

Problem Description

In situations where a very high CPU load is present, and the I2S engine is constantly running to play audio, it is important that I2S can run as fast as possible to not slow down other parts of the system. If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore, or something else was going on with un-inlining, I am not quite certain. (Some bloaty information that may provide further clues is here.) This slowed down the high-load procedure significantly, and

The fix for this problem was simply to force all virtual functions of I2SClass into IRAM, like so:

  // from Stream
  virtual int IRAM_ATTR available();
  virtual int IRAM_ATTR read(); // Blocking, will wait for incoming data
  virtual int IRAM_ATTR peek(); // Blocking, will wait for incoming data   
  virtual void IRAM_ATTR flush();

  // from Print (see notes on write() methods below)
  virtual size_t IRAM_ATTR write(uint8_t);
  virtual size_t IRAM_ATTR write(const uint8_t *buffer, size_t size);
  virtual int IRAM_ATTR availableForWrite();

Despite my initial expectations, this made xtensa-gcc happy again and the code returned to previous performance.

To avoid having to edit base libraries for this to work, I'd prefer if there was the option to enable these attributes via a define, for those who need it. This would mean no functional changes to everyone who doesn't need it, providing any user full control over IRAM code.

Note: I would like to not receive recommendations for "better" hardware specific to my project. Not only is there no direct replacement for the D1 mini given my peripherals, but I am operating far away enough from the ESP8266's performance and memory limit that it's manageable without weird tricks (not that it's comfortable either, but if you want comfortable programming you become a web developer). I am asking for an option that allows me to optimize my code, and I consider carefully moving functions in and out of IRAM to be a proper optimization technique on architectures like the ESP8266. The inner demoscene kid in me is also having some fun here.

MCVE

This exact commit; note that it requires specific hardware which is laid out in the repository. It doesn't really get more minimal than this, since the issue only presents itself when the CPU load is extremely high.

Debug Messages

(None. I can't even enable the serial console here.)

mcspr commented 7 months ago

If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something? Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

kleinesfilmroellchen commented 7 months ago

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something? Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

mcspr commented 7 months ago

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

...why look there instead of build definitions? 🙃

For Arduino IDE / arduino-cli https://github.com/esp8266/Arduino/blob/685f2c97ff4b3c76b437a43be86d1dfdf6cb33e3/platform.txt#L78-L96

For PlatformIO https://github.com/esp8266/Arduino/blob/685f2c97ff4b3c76b437a43be86d1dfdf6cb33e3/tools/platformio-build.py#L61-L174

-fno-inline-functions only for .c files and does not apply to i2s code

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

Yet again, I would not know where to look as I still not quite sure what is the actual issue. Suppose IRAM for I2S can be changed as suggested above, but class change only applies to the class and not the i2s_... implementation funcs it calls.

iirc manually pre-caching hot path of the app might be the solution https://github.com/esp8266/Arduino/pull/6628 https://github.com/esp8266/Arduino/issues/6559 (ref. https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_spi_utils.cpp /PRECACHE/, /precache/, etc.)

Or, rewriting .ld script to forcibly move certain funcs to always be in IRAM. Mind the MMU option, though, as your app already severely limits IRAM.