esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.96k stars 13.34k forks source link

Debug SPI arbitration between FS and MCU? #1576

Closed nouser2013 closed 8 years ago

nouser2013 commented 8 years ago

Greetings advanced dev's,

I was having trouble with a sketch that serves HTTP requests while at the same time reading from SPIFFS (~200 bytes every 10ms from one single open file) for a Ticker based background task.

I assume that there are three parties accessing SPI flash:

This scenario crashes the ESP randomly, as described here. Either the IP subsystem freezes (not sending any wifi frames anymore), while background tickers continue to run (LED blinks, buttons work, serial output fine) or the ESP crashes with a stack trace completely.

I tracked this down to the SPIFFS. When I stop reading from the file completely (and increasing reading period interval just prolongs lifetime, but does not prevent the crashes, perhaps b/c at some point by random chance there is a clash), the webserver will continue to run indefinitely without any problems.

Is there any way to debug this ("SPIFFS" does not seem to be in the Debug-Level select list)?

WereCatf commented 8 years ago

Are you reading from SPIFFS inside a ticker-function? If yes, then you shouldn't be doing that. Ticker-functions are supposed to be kept real short and you shouldn't be performing blocking actions in them and instead just set a flag there and perform the actual blocking action inside the loop().

nouser2013 commented 8 years ago

Ah that's a valid point, thanks. I tried moving the code to loop(), but it really doesn't make a difference. ESP stops sending IP packets after 2-5 minutes with SPIFFS access, so I'm still leaning in the direction of arbitration.

WereCatf commented 8 years ago

Well, I have no idea what's wrong. I serve files from SPIFFS all the time without an issue and I also allow for uploading of files to SPIFFS via the web-server.

igrr commented 8 years ago

If you are using spiffs with me-no-dev's async webserver, then crashes are kind of expected. SPIFFS is not thread-safe, and it expects to be called from Arduino task only. This isn't the case with async stack. If you are using it with sync webserver, then please share the sketch so we can reproduce.

On Fri, Feb 5, 2016, 13:18 WereCatf notifications@github.com wrote:

Well, I have no idea what's wrong. I serve files from SPIFFS all the time without an issue and I also allow for uploading of files to SPIFFS via the web-server.

— Reply to this email directly or view it on GitHub https://github.com/esp8266/Arduino/issues/1576#issuecomment-180284551.

nouser2013 commented 8 years ago

Uhm, I may not completely understand. The webserver never reads SPIFFS, it just AsyncClient::write() s some char * (globally declared!) which are either sprintf()ed or strcpy_Ped for flash / progmem access. I'm fairly certain to not have buffer overruns. But at no point in any callback of the webserver do I have access to SPIFFS.

I'm not using AsyncWebserver, but AsyncTCP with a tiny GET parser with small c string functions, just to eliminate causes.

I had the one and only SPIFFS read access (File declared globally) in a Ticker callback, but moved this to loop() as suggested by @WereCatf . The code behaves still the same. First IP stack freeze, then, 5 mins later complete ESP crash with ~30 lines of stack trace.

And when IP stack is frozen, SPIFFS access still works, I checked via interactive shell on UART.

If I do not use SPIFFS read in loop(), system has been running for 30 minutes with TCP request "bombardments" with no error and still delivering the sprintf'ed dynamic content. At times, 2-3 connections wait for web content in a connection queue, works without problems.

I can try to use SyncTCP again, performance should be the same (if LWIP.a has a too small tcp_snd_buf compiled in, I'll have the 200ms Windows ACK delay). Will update as soon as I have something.

nouser2013 commented 8 years ago

Alright, after a lot of testing, here goes. The project is for WS2812 LEDs, therefore I use adafruits function to write the LEDs. The data sent to LEDs is read from an SPIFFS file, which also stores a delay until new LED data is to be sent ("animation"). Sending of LED data will only work flickerfree for a single frame of an animation (60 LEDs), if I disable and re-enable interrupts before and after the output function.

I removed everything asynchronous from the sketch, only ESP8266 Webserver with its handleClient() is in loop(). I also updated the sketch to have the delay Ticker set a flag which is evaluated in main and loads new LED data and displays the those values. Under those conditions the sketch runs stable for hours, but: as soon as the webserver is accessed and delivers data, LED display get stuck, obviously for as long as the webserver needs.

volatile bool loop_loadNewFrame;
void setup() { ... };
void loop() {
  webserver.handleClient();
  if (loop_loadNewFrame) {
    loop_loadNewFrame = false;
    ws2812_displayCurrentFrame();
    ws2812_loadNextFrameFromSPIFFS();
    animationDisplayTicker.once(ws2812_currentFrameDelay, animationDisplayTimer);
  }
}
TickerCallback animationDisplayTimer() {
  loop_loadNewFrame = true;
}
ws2812_displayCurrentFrame() {
  noInterrupts(); ws8212_write(); interrupts();
}

If I put the ws2812 code from loop() inside the ticker and access SPIFFS from there, the sketch will lockup WiFi IP core eventually and shortly after WDT reset.

On the other hand, I need the webserver to run independently from the animation. Perhaps I'm doing something wrong? Loading the whole animation into RAM does not seem feasible...

me-no-dev commented 8 years ago

@igrr where is that optimistic_yield in SPIFFS that you think is making SPIFFS usage not thread safe?

nouser2013 commented 8 years ago

Hmm, I may have been wrong after all. Even when reading from loop() the sketch (above pseudo code structure) was leaking memory every second. I then removed the ws2812_displayCurrentFrame() completely while maintaining the other stuff. Leaking gone. Even @me-no-dev Async Classes work perfectly now.

I then put the ws2812_displayCurrentFrame() back in but without the two interrupts statements. Of course, LEDs will flicker now, but no leaking, and runs indefinitely stable with 37k heap, and one http request every 0.5s.

I've seen the source of the interrupt() / noInterrupt() macros, but why do they have this large influence on the whole device? When writing 60 ws2812 LEDs, interrupts are disabled for 60 * 3 * 8 * 1.25us + 50us = 1491,25us ~= 1,5ms. Is this bad? What else could I do to send the string to the LEDs flickerfree? I used to do the same thing on NodeMCU, worked without difficulties.

me-no-dev commented 8 years ago

use the i2s implementation or maybe even the serial :) i2s has DMA which can hold the data and let you do your thing without making the sketch wait. the serial implementation also has buffer (128 bytes) and if 30 leds can fit into that then you'll be fine with it also.

nouser2013 commented 8 years ago

Small update here. I'm using @cnlohr s I2S implementation (stripped of his unnecessary stuff). It seems to work stable only if I'm making sure that only one single SPIFFS function is "active" at a time. If I'm reading a file and that gets interrupted by e.g. a SPIFFS dirlist ==> crash the ESP. The drawback obviously being no serial input anymore :( but I can live with that.

cnlohr commented 8 years ago

I am curious what you think may cause that? I am unaware of any times buffer underflows, etc. can cause a reboot!