esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.96k stars 13.34k forks source link

git: Webserver leaks on LmacRxBlk and crashes (exception 9 & 24) #1458

Closed andig closed 8 years ago

andig commented 8 years ago

After adding HTTPClient to my sketch I've started to see various restarts.

Update not related to HTTPClient. Also getting those crashes with webserver.

Crash logs:

http://pastebin.com/31v49QZC http://pastebin.com/PnjZaGzC http://pastebin.com/QwhjzZPL http://pastebin.com/vdZBxTGs

The first might be most interesting as it shows something else, too. Free heap before adding httpclient was 31k, after adding it went down to 25k.

All new variables are local to a single function (except for global httpclient). This is the (new) code that causes the problem:

void OneWirePlugin::upload() {
  for (int8_t i=0; i<devs; i++) {
    char uuid_c[40];
    getUuid(uuid_c, i);

    // uuid configured?
    if (strlen(uuid_c) > 0) {
      float val = getValue(i);

      char val_c[16];
      if (isnan(val))
        strcpy(val_c, "null");
      else
        dtostrf(val, -4, 2, val_c);

      String uri = g_middleware + "/data/" + String(uuid_c) + ".json?value=" + String(val_c);
      http.begin(uri);
      int httpCode = http.POST("");
      DEBUG_ONEWIRE("[1wire] upload %d %s\n", httpCode, uri.c_str());
      http.end();
    }
  }
}
Links2004 commented 8 years ago

you never read the data coming from server. this can make problems, since the TCP buffer get full.

latest git has this: https://github.com/esp8266/Arduino/commit/fb55e911181ffc032dd3ebb1eb746e026a9e7bb4

and may more imported rc1 runs on SDK 1.5.0 which is very buggy. latest git has SDK 1.5.1, if possible update to latest git.

andig commented 8 years ago

latest git has SDK 1.5.1, if possible update to latest git.

I'm already on latest git, including fb55e91. I can furthermore confirm crashes even without HTTPClient and with sufficient heap available.

andig commented 8 years ago

I've done some further testing. Crashes are happening once there is some load on the system.

Test scenario: 2 sequences calling the json api in parallel, heap is being monitored to serial. This is running fine for ages.

Then: use browser (Windows = delayed ACK) to load index.html. Log starts to see LmacRxBlk:1 messages. Once those messages appear loading the index.html stalls and sonner or later the ESP crashes. For log see http://pastebin.com/vDmNrvpm#174.

LmacRxBlk:1 also seems related to https://github.com/esp8266/Arduino/issues/1450 but I couldn't find where its generated in the code?

andig commented 8 years ago

The problem apparently is with LmacRxBlk in that it leaks memory even when the web server doesn't have any handlers associated.

Once LmacRxBlk has been received once, even if load is taken off the esp, it will eventually loose and restart wifi connection and/or crash (with current git).

To put load on the system use this html file. in my case running on Windows PC- might be related to the delayed ACK issue?

<!DOCTYPE html>
<html>
<body>
<script src="https://code.jquery.com/jquery-2.1.4.min.js"></script>
<script>
$(document).ready(function() {
    window.setInterval(function() {
        $.getJSON("http://vzero-edd834.local", {
            timeout: 100
        });
    }, 100);
});
</script>
</body>
</html>

Update I've tried reproducing the memory leak on 2.0.0 stable and wasn't able to do so. It seems that something in current git since 2.0 might be causing the crash behavior.

andig commented 8 years ago

This is the shortest log I could find to repro the crash with the sketch/ calling code above:

[wifi] waiting for connection..............................
[wifi] IP address: 192.168.0.30
OTA server at: vzero-edd834.local:8266
41320
41312
41192
40872
40848
40712
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
pm open,type:2 0
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
36592
Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x401055b5, epc2=0x00000000, epc3=0x00000000, excvaddr=0xa65d893a, depc=0x00000000
andig commented 8 years ago

And here is a log of core+wifi up to the exception: http://pastebin.com/rRBzsRJr

Links2004 commented 8 years ago

LmacRxBlk is a overflow of the input buffer, the ESP is simple overloaded. see: #1450

you can test the Async webserver: https://github.com/me-no-dev/ESPAsyncWebServer

andig commented 8 years ago

Thanks @Links2004 I'll check the async server. However- even if the input buffer overflows- should the WS handle this gracefully? The mem leak/ corruption shouldn't be happening in any case?

igrr commented 8 years ago

The overflow happens at MAC level, so this is even below the TCP/IP stack. It's pretty hard to handle this condition gracefully at app layer. There is a solution for this, but it relies on dynamic memory allocation... So you can still flood the ESP easily.

On Mon, Jan 18, 2016, 21:17 andig notifications@github.com wrote:

Thanks @Links2004 https://github.com/Links2004 I'll check the async server. However- even if the input buffer overflows- should the WS handle this gracefully? The mem leak/ corruption shouldn't be happening in any case?

— Reply to this email directly or view it on GitHub https://github.com/esp8266/Arduino/issues/1458#issuecomment-172611030.

andig commented 8 years ago

The overflow happens at MAC level, so this is even below the TCP/IP stack. It's pretty hard to handle this condition gracefully at app layer.

Pheew- thats bad news indeed.

andig commented 8 years ago

I have some troubles converting my handlers to the async server (sorry- C is not my primary language). The handlers have the signature of ArRequestHandlerFunction:

void handleRoot(AsyncWebServerRequest *request) {}

yet

g_server.on("/index.html", handleRoot);

gives compile error:

sketch\webserver.cpp: In function 'void webserver_start()':
webserver.cpp:308: error: no matching function for call to 'AsyncWebServer::on(const char [12], <unresolved overloaded function type>)'
   g_server.on("/index.html", handleRoot);
                    ^
sketch\webserver.cpp:308:40: note: candidates are:
In file included from sketch\webserver.cpp:8:0:
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:258:10: note: void AsyncWebServer::on(const char*, ArRequestHandlerFunction)
     void on(const char* uri, ArRequestHandlerFunction onRequest);
      ^
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:258:10: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'ArRequestHandlerFunction {aka std::function<void(AsyncWebServerRequest*)>}'
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:259:10: note: void AsyncWebServer::on(const char*, WebRequestMethod, ArRequestHandlerFunction)
     void on(const char* uri, WebRequestMethod method, ArRequestHandlerFunction onRequest);
      ^

As workaround I'm simply wrapping it, but would like to know if I can do without the wrapper:

g_server.on("/", [](AsyncWebServerRequest *request) {
    handleRoot(request);
});

Sorry if this is more a C than an ESP question ;)

igrr commented 8 years ago

no known conversion for argument 2 from '<unresolved overloaded function type>' to 'ArRequestHandlerFunction {aka std::function<void(AsyncWebServerRequest*)>}'

Do you have another function in the sketch with the same name? Compiler thinks that this is an overloaded function.

drmpf commented 8 years ago

I think I had a similar problem, so I used

void pfodESP8266WebConfig::handleRoot(pfodESP8266WebConfig* ptr) {
  ptr->rootHandler();
}

void pfodESP8266WebConfig::handleConfig(pfodESP8266WebConfig* ptr) {
  ptr->configHandler();
}
void pfodESP8266WebConfig::handleNotFound(pfodESP8266WebConfig* ptr) {
  ptr->rootHandler();
}

void pfodESP8266WebConfig::init(uint8_t _basicStartAddress) {
  basicStartAddress = _basicStartAddress;
  webserver.begin();
  webserverPtr = &webserver;
  webserverPtr->on ( "/", std::bind(&pfodESP8266WebConfig::handleRoot, this));
  webserverPtr->on ( "/config", std::bind(&pfodESP8266WebConfig::handleConfig, this));
  webserverPtr->onNotFound( std::bind(&pfodESP8266WebConfig::handleNotFound, this));
}
andig commented 8 years ago

Do you have another function in the sketch with the same name? Compiler thinks that this is an overloaded function.

No- it's the only handleRoot function. Wrapping it is fine, so never mind (might be added to ESPAsync samples @me-no-dev).

Async server is reproducible panicing for me as soon as I start using SPIFFS when reading a (larger) file though:

[webserver] uri: / args: 0
[core] heap: 27712
[webserver] string constructed
[webserver] file opened

Panic C:\andi\arduino\hardware\eps8266com\esp8266\cores\esp8266\core_esp8266_main.cpp:85 __yield

ctx: sys 
sp: 3ffffbc0 end: 3fffffb0 offset: 01b0

>>>stack>>>
3ffffd70:  402484dc 0000000b 3ffffdd8 402136dd  

This is the code:

void handleRoot(AsyncWebServerRequest *request)
{
  DEBUG_SERVER("[webserver] uri: %s args: %d\n", request->url().c_str(), request->params());
  DEBUG_HEAP;

  String indexHTML = F("<!DOCTYPE html><html><head><title>File not found</title></head><body><h1>File not found.</h1></body></html>");
  DEBUG_SERVER("[webserver] string constructed\n");

  File indexFile = SPIFFS.open(F("/index.html"), "r");
  DEBUG_SERVER("[webserver] file opened\n");

  if (indexFile) {
    indexHTML = indexFile.readString();
    DEBUG_SERVER("[webserver] file read\n");
    indexFile.close();
    DEBUG_SERVER("[webserver] file closed\n");
    DEBUG_HEAP;
    DEBUG_SERVER("[webserver] content length: %d\n", indexHTML.length());
  }

  request->send(200, "text/html", indexHTML);
}

void webserver_start()
{
  g_server.on("/", [](AsyncWebServerRequest *request) {
    handleRoot(request);
  });

  // start server
  g_server.begin();
}
igrr commented 8 years ago

Could you please move further AsyncWebServer discussion to https://github.com/me-no-dev/ESPAsyncWebServer/issues?

The reason it is panicking is that it calls File system APIs which expect to be used on the main thread.

me-no-dev commented 8 years ago

and you can request->send("/index.html"); and let the server deal with the file and all. check the example supplied with the server library for SPIFFS usage and responses :)

me-no-dev commented 8 years ago

@igrr doesn't that __yield panic mean that something using yield was called from non-switching context? since Async callbacks are such context I imagine. If so maybe the indexFile.readString(); is what triggers it?

igrr commented 8 years ago

Yes, every flash operation within SPIFFS HAL calls optimistic_yield, because flash access is a potentially long operation. I think we can relax this for reads, so you can at least read files from SPIFFS from tcp stack callbacks.

Edit: try removing this line: https://github.com/esp8266/Arduino/blob/master/cores/esp8266/spiffs_hal.cpp#L46

me-no-dev commented 8 years ago

The example I provided uses SPIFFS and is a copy of the FSBrowser example we have for the regular WebServer. So I'm am positive that SPIFFS used as intended is working great (I have tested with files up to 2MB upload/download). There is also the static handler thing that I have ported over. Maybe since I do async reads of small portions, they do not trigger the optimistic_yield(), thus I have never seen that error before

andig commented 8 years ago

and you can request->send("/index.html"); and let the server deal with the file and all.

@me-no-dev sending file via SPIFFS handler is wfm.

I need to use AsyncBasicResponse response though as I need to do some replacements inside the file before serving. Panic definitely happens on file read. I'm a little confused as to why this isn't an issue with the AsyncFileResponse as well since it also needs to read the file?

I'll move to ESPAsyncWebServer as requested.

@igrr as the root cause was crashes/leaks from MAC layer on overload- should this issue be closed?

igrr commented 8 years ago

Not really, i hope that one day we will find a solution or at least a workaround for these issues.

me-no-dev commented 8 years ago

@igrr how do you feel about replacing yield here and there with optimisic_yield or better have yield be optimistic. The reason asking is because we have written over the time many methods to yield (like Stream::readString() above), which was fine up till now because all was blocking and called from the loop context, but those methods can no longer be used in async contexts without making them not to yield unnecessary.

andig commented 8 years ago

tl:dr

I've migrated to the async webserver. After fixing some issues with code using yield (e.g. Stream::readString) I'm almost overwhelmed with the performance I'm getting. No more issues with parallel requests.

me-no-dev commented 8 years ago

@andig maybe close this now :)