Makuna / NeoPixelBus

An Arduino NeoPixel support library supporting a large variety of individually addressable LEDs. Please refer to the Wiki for more details. Please use the GitHub Discussions to ask questions as the GitHub Issues feature is used for bug tracking.
GNU Lesser General Public License v3.0
1.18k stars 260 forks source link

ESP32: Template class static callback method not located in IRAM even though marked to be so #321

Open Makuna opened 4 years ago

Makuna commented 4 years ago

Describe the bug static void IRAM_ATTR _translate() This method is marked for IRAM, but is placed into the flash section.

Additional Context from gitter conversation started by Andreas Merkle:

"Problem seems to be: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70435 ...
At the moment I use a RMT ISR level 2, which means only higher level ISRs can interrupt. With this option I can click around on webserver and everything is fine ... I am using parallel e.g. the ESPAsyncWebServer and I have a lot of artifacts. I moved the necessary functions via the linker to IRAM and set ESP_INTR_FLAG_LEVEL2 | ESP_INTR_FLAG_IRAM in the esp_intr_alloc(). After this, it looks like all artifacts are gone."

Mike Dunston: "IRAM_ATTR is not 100% required unless you require the function to run from an ISR context WHEN flash is DISABLED if it is not in IRAM it will be deferred until flash is enabled ... using SPIFFS for serving files via ESPAsync? If so, that is going to be a bigger problem than simply moving various bits into IRAM"

Makuna commented 4 years ago

xtensa-esp32-elf-objdump -x app-template.elf -d

BlueAndi commented 4 years ago

I opened an issue here too: https://github.com/espressif/esp-idf/issues/4542

Note, you can use xtensa-esp32-elf-nm -C -n firmware.elf too, to get address and size with demangling.

BlueAndi commented 4 years ago

The main problem of flickering and artifacts seems to be flash access during the running RMT ISR. To move the _translate() method to IRAM will improve this. Increasing the interrupt level improves this too, but with the drawback that other things might be delayed.

To get rid of this (with or without the IRAM_ATTR problem) do the following: Create a separate task with prio 4, which calls the show() routine. Wait with canShow() until the physical update via RMT is finished. Then call delay() with your task period.

Because the task has prio 4 and the ESPAsyncWebServer task has a lower prio, there will be no flash access (SPIFFS) during update via RMT. And voila ... no flickering ... no artifacts anymore. The webserver is still very responsive.

BlueAndi commented 4 years ago

fyi: https://github.com/espressif/esp-idf/issues/4545

ben-mkiv commented 4 years ago

is data sent within one transaction? if not this might be related: https://github.com/espressif/esp-idf/issues/3824

however, disabling the tx interrupt didnt solve the issue for me (also be aware that theres a deadlock in esp-idf rmt.c when disabling the interrupt, as the TX mutex is never released)

Makuna commented 4 years ago

fyi: espressif/esp-idf#4545

I have tried to apply this work around, but I am getting a compiler error.

I2S causes a section type conflict with void i2sDmaISR(void*)

It seems there is compiler bug and it requires a work around within the build system.

drzony commented 4 years ago

As mentioned in #292 I'm posting some findings regarding the IRAM issues:

We are having some discussions with Sam Guyer in this FastLED issue

One thing that we found out is that current GCC has a bug where template class methods are not put into RAM. GCC bug, ESP_IDF issue

However Sam Guyer has created a fork of FastLED where ISR handlers are stored in IRAM and it did not help.

@Makuna Maybe some cooperation with Sam Guyer would bring something new to the table. I'm a bit hesitant about mixing the code of FastLED and NeoPixelBus due to incompatible licenses (FastLED -> NeoPixelBus is OK, but other way around is impossible), so I'm leaving this up to you.

Makuna commented 4 years ago

This bug is strictly tracking the IRAM attribute not being applied. Not really the effects of such; as there is another bug for that.

samguyer commented 4 years ago

@Makuna Did you resolve the section type conflict problem? I was getting that error, and it took me a while to understand and fix it.

@drzony I will push a version to my clone of the FastLED repo that has all relevant functions in IRAM.

bbulkow commented 4 years ago

Hey, reporting in with my current experimental results.

  1. I have created the simplest possible test program, which uses the ESP-IDF supplied test program "led_strip", but changes the visual effect to a smooth hue sweep to check for artifacts. Annoyingly, this test program has a pattern which frequently clears the entire strip, so it's hard to see if there are artifacts. This very simple program uses the default RMT driver, and uses translate. Espressif code:

    https://github.com/espressif/esp-idf/tree/c77c4cc/examples/peripherals/rmt/led_strip
  2. Doing this in ESP-IDF instead of Arudino, and currently tested only with 'master'.

  3. With no web service, of course it looks great.

  4. With a web service enabled and a browser window open, I get a fair number of artifacts. I'm not using ESPAsync, I'm using the ESP default web server, but I am serving two fairly large objects out of flash. I've also seen artifacts when NOT serving out of flash. I've made the symptoms a bit worse by making my two large objects "no-cache" in HTTP, and I see them being requested with regularity ( this is a test, obviously! )

  5. I've applied the changes of setting FLAG_IRAM and FLAG_LEVEL2, with no appreciable change.

  6. I've dumped the locations of 'ws2812_rmt_adapter', which is Espressif's demo code's translate function, and it says:

    40084a1c l     F .iram0.text    000000a2 ws2812_rmt_adapter

    as opposed to the next function in the file, which is:

    400f8680 l     F .flash.text    00000044 ws2812_set_pixel

    I haven't looked at elf format binaries in a while but that seems fairly conclusive that the translate code is properly in IRAM.

This kinda removes the IRAM theory as a major suspect ( I believe you that the templates are mucking with the IRAM-ability, but this backs up Sam's claim that getting the translate function into IRAM helps ).

  1. My board is an espressif made PICO dev kit 4, and I have a level shifter. My LEDs are Alitova WS8211 12v "bulb" style instead of strip style.

I am a bit at a loss. Some directions to go next are looking at the output with a scope and see if I can see what's going wrong, which might be informative; another is reporting on the Espressif chain, since this is now a much simpler test to run; another is removing the TRANSLATE function and loading up the buffer directly; another is fishing through the different ESP-IDF versions and see if behavior changes.

Here's an interesting idea. What causes the visual artifact is a pause in the middle of a send, then starting up again, but not at the 0ith position. When the send picks up, the bytes and probably bits (would have to think about that) are misaligned regarding their intended space, let alone the fact that you're starting from zero again.

If one could detect that a pause happened and the "string update" had gone bad, and not send the rest of the bits (misaligned and starting over from zero), you'd lose a little bit of the update which isn't getting pushed, but as long as you either had another update coming soon, or re-did this update from 0, you'd be OK.

Of course this would be a "work around" or mitigation for the fact that we're not getting the timing we need, but hey its an RTOS and these things happen --- so I'll look into that as well.

Any other thoughts?

Ie, if the _write errored out on an underflow, this would be recoverable.

samguyer commented 4 years ago

One thing that I did recently in the FastLED driver is to add a bunch of timing instrumentation to make sure that the interrupt handler is getting called at the right intervals. We use a double-buffering scheme where the RMT write buffer is divided into two segments -- one segment is being sent while the other is being refilled. Other kinds of interrupts (WiFi or whatever else is going on) can come in, but we still have plenty of time to get the buffer refilled.

I think my current branch is about as efficient and robust as I could make it without requiring a ton of extra RAM (to hold all the pretranslated bits).

bbulkow commented 4 years ago

I've added instrumentation to the example RMT / LED code from Espressif, and I'm finding that occasionally there's 250 to 350us between translate calls. This is with the rather poor ESP HTTP server, and serving large content from flash, arguably a rather worst-case, although it is using a proper IRAM function.

I've ported your most recent code over into my FastLED-idf branch (again). There's an annoying problem with having to bzero the rmt_config_t, which Espressif created recently, now your code runs.

Guess what? "too slow" messages. Sigh. I'm willing to believe your code is efficient, but I'll be running through it myself.... see what can be done about that semaphore....

I propose that with all the efficiency in the world, one will occasionally still miss a timing, because, yknow, RTOS. Especially if one wants to serve large content objects from Flash. One just never knows.

And sorry, Makuna, for hijacking the thread. It now has very little to do with a bug regarding IRAM designations getting lost.

samguyer commented 4 years ago

@bbulkow Which version of my code are you looking at? I experimented with a version in which the interrupt just informs the regular code that a "fill" is needed. That code is too slow -- the latency of the semaphore is very high. If you really want to go that route, I recommend switching to the TaskNotify system, rather than using the semaphore.

The big problem with this strategy is that it won't work under SPIFFS or any flash access. The problem is that during flash operations, ESP-IDF disables all code on both cores EXCEPT interrupt handlers running in IRAM. Everything else is shut down.

So, in my latest push, I moved the "fill" routine back into the interrupt handler and got rid of the semaphore (except for the one that indicates the data is all sent). I believe that this version is not vulnerable to flash access issues.

But I also like the idea of detecting the problem and simply canceling the rest of the show operation. What several of the other drivers in FastLED do is detect the interruption and retry the whole thing.

samguyer commented 4 years ago

@bbulkow @Makuna There is currently no fix for the attributes+templates problem in gcc. I ended up moving all of the interrupt code out of the template.

BlueAndi commented 4 years ago

@samguyer I did the same for the NeoPixelBus, moving the interrupt service routines out of the template class. See referenced commit above.

@Makuna If you see any improvement, please let me know. How I did it is the fastest way I think, without handling all channels by one single routine and to keep your design approach.

Evidence:

 .iram1         0x400810dc      0x263 .pio\build\esp32doit-devkit-v1\lib03c\libNeoPixelBus.a(NeoEsp32RmtMethod.cpp.o)
                0x40081148                _ZN22NeoEsp32RmtSpeedWs28119translateEPKvP12rmt_item32_tjjPjS4_
                0x4008116c                _ZN23NeoEsp32RmtSpeedWs2812x9translateEPKvP12rmt_item32_tjjPjS4_
                0x40081190                _ZN22NeoEsp32RmtSpeedSk68129translateEPKvP12rmt_item32_tjjPjS4_
                0x400811b4                _ZN22NeoEsp32RmtSpeedTm18149translateEPKvP12rmt_item32_tjjPjS4_
                0x400811d8                _ZN23NeoEsp32RmtSpeed800Kbps9translateEPKvP12rmt_item32_tjjPjS4_
                0x400811fc                _ZN23NeoEsp32RmtSpeed400Kbps9translateEPKvP12rmt_item32_tjjPjS4_
                0x40081220                _ZN22NeoEsp32RmtSpeedApa1069translateEPKvP12rmt_item32_tjjPjS4_
                0x40081244                _ZN30NeoEsp32RmtInvertedSpeedWs28119translateEPKvP12rmt_item32_tjjPjS4_
                0x40081268                _ZN31NeoEsp32RmtInvertedSpeedWs2812x9translateEPKvP12rmt_item32_tjjPjS4_
                0x4008128c                _ZN30NeoEsp32RmtInvertedSpeedSk68129translateEPKvP12rmt_item32_tjjPjS4_
                0x400812b0                _ZN30NeoEsp32RmtInvertedSpeedTm18149translateEPKvP12rmt_item32_tjjPjS4_
                0x400812d4                _ZN31NeoEsp32RmtInvertedSpeed800Kbps9translateEPKvP12rmt_item32_tjjPjS4_
                0x400812f8                _ZN31NeoEsp32RmtInvertedSpeed400Kbps9translateEPKvP12rmt_item32_tjjPjS4_
                0x4008131c                _ZN30NeoEsp32RmtInvertedSpeedApa1069translateEPKvP12rmt_item32_tjjPjS4_
Makuna commented 4 years ago

@BlueAndi With your changes, if you leave the new common translate in the cpp; but move the individual wrapper translate into the header file; does it compile out the individual wrappers and also leave the common translate in IRAM?

BlueAndi commented 4 years ago

@Makuna It compiles, but I get linker warnings regarding dangerous relocation: l32r: literal placed after use. Thats why I moved the wrappers to the .cpp file.

According to the esp32 spi flash source (see links, especially the 2nd), I must additional add DRAM_ATTR to the used "static const" data. Not done yet, but I will add them too. https://github.com/espressif/esp-idf/tree/master/components/spi_flash#concurrency-constraints-for-flash-on-spi1 https://github.com/espressif/esp-idf/tree/master/components/spi_flash#iram-safe-interrupt-handlers

Makuna commented 4 years ago

@BlueAndi Did you try just having the translate in the NeoEsp32RmtSpeedBase?

BlueAndi commented 4 years ago

@Makuna I thought about, but access to the RmtBit0, RmtBit1 and RmtDurationReset is necessary and I tried to avoid having pure virtual getter methods.

BlueAndi commented 4 years ago

@bbulkow BTW the ws2812_rmt_adapter in the example contains constant data, which might be served out of flash. The DRAM_ATTR is missing there, according to https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/storage/spi_flash.html#iram-safe-interrupt-handlers

Additional the ESP_INTR_FLAG_IRAM is not set as well in the example, but I guess you set this.

Makuna commented 4 years ago

Just capturing some stuff that I have found while integrating the work around above.

I applied the changes but put the _translate implementation into the base speed class in the header so it will inline that method optimize out a method call to it in the ISR.

With the S2 (current GitHub) this works great! https://github.com/Makuna/NeoPixelBus/pull/362

With the current released Esp32 build, this doesn't work at all. It causes a Section type miss match on the base _translate due to the IRAM_ATTR. So this pull isn't active yet until the next ESP32 Arduino release.

https://github.com/Makuna/NeoPixelBus/pull/363 This pull will causes the compiler create code adding a call immediately in the ISR; but it will compile ESP32 Arduino active release.

Makuna commented 4 years ago

https://github.com/Makuna/NeoPixelBus/pull/366

Makuna commented 4 years ago

Leaving open as the original issue is still active but a temporary work around has been released. This may need some changes when the next Esp32 Arduino release is made. https://github.com/Makuna/NeoPixelBus/releases/tag/2.6.0