esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.07k stars 13.33k forks source link

ESP8266 crashing upon HTTPS-Update when using SSL #2758

Closed derphilipp closed 5 years ago

derphilipp commented 7 years ago

Upon executing

    auto up = ESP8266HTTPUpdate();
    auto ret = up.update(_host, 443, _uri, _firmware_version, _fingerprint);
    up.rebootOnUpdate(true);

The EPS8266 crashes. This only happens when

  1. I am using a HTTPS connection
  2. When an update package actually gets delivered (all other states like "is already up to date", etc. all work fine)
  3. Even when I am just delivering a minimal "Hello World" package or even an empty file the ESP8266 crashes when trying to update.

The update server is running on AWS Elastic Beanstalk, certificate also is from AWS.

Is there a known problem when dealing with that situation?

Here is my crashlog, incuding some debugging information I inserted into the update protocol:

[httpUpdate] Header read fin.
[httpUpdate] Server header:
[httpUpdate]  - code: 200
[httpUpdate]  - len: 303056
[httpUpdate]  - MD5: efc2d9dffd11f7932ad16cdb6485b703
[httpUpdate] ESP8266 info:
[httpUpdate]  - free Space: 741376
[httpUpdate]  - current Sketch Size: 306112
[httpUpdate]  - current version: pIng
[httpUpdate] runUpdate flash...
[httpUpdate] Update.begin ....
sleep disable
[begin] roundedSize:       0x0004A000 (303104)
[begin] updateEndAddress:  0x00100000 (1048576)
[begin] currentSketchSize: 0x0004B000 (307200)
[begin] _startAddress:     0x000B6000 (745472)
[begin] _currentAddress:   0x000B6000 (745472)
[begin] _size:             0x00049FD0 (303056)
[httpUpdate] Update.write Stream ...
data to read 4096
written: 4096
data to read 4096
written: 8192
data to read 4096
written: 12288
data to read 4096
written: 16384
data to read 4096
written: 20480
data to read 4096
written: 24576
data to read 4096
written: 28672
data to read 4096
written: 32768
data to read 4096
written: 36864
ssl->need_bytes=16432 > 6859
failed to grow plain buffer
check_poison_block is called for free block 0x3fff2fc8
Fatal exception 3(LoadStoreErrorCause):
epc1=0x402074d5, epc2=0x00000000, epc3=0x00000000, excvaddr=0x40035008, depc=0x00000000

Exception (3):
epc1=0x402074d5 epc2=0x00000000 epc3=0x00000000 excvaddr=0x40035008 depc=0x00000000

ctx: cont 
sp: 3fff11c0 end: 3fff1650 offset: 01a0
derphilipp commented 7 years ago

Update: It seems that 0x5000 bytes are getting transfered; This changes when I change the reverse-proxy from "HTTPS" to "SSL" in the AWS Beanstalk settings; Also: On the same server, with the same application it works when I just use HTTP instead.

davisonja commented 7 years ago

I've run into similar behaviour with a local apache and letsencrypt cert. There does seem to be some fine-lines with SSL and HTTPUpdate to do with memory usage/corruption - I found the same thing with poisoning on. Like you I've had no issue with changing only https to http (so everything else remains the same). What dos that epc1 address correspond to in terrms of function?

ghost commented 7 years ago

For what it's worth, I'm also having this issue with 2.3.0-rc2 when trying to pull a small bin over HTTPS. Server side is similar to above, Let's Encrypt cert with httpd/apache2.

ssl->need_bytes=16432 > 6859 failed to grow plain buffer Fatal exception 3(LoadStoreErrorCause): epc1=0x4010011d, epc2=0x00000000, epc3=0x00000000, excvaddr=0x400396b8, depc=0x00000000

duchere commented 7 years ago

Does anyone have any update on this issue that I'm also experiencing ? That's too bad as HTTPS OTA allows to be sure that's the binary comes from my server not from one faking to be.

Khizer-Jan commented 7 years ago

this error is probably caused due to corrupt flash memory. in order to completely format flash use following code. https://github.com/kentaylor/EraseEsp8266Flash/blob/master/EraseFlash.ino

davisonja commented 7 years ago

Signed updates can also be used to verify the update origin and may be a viable alternative in the interim. General opinion is the crash is caused by running out of RAM as SSL is quite memory intensive.

aderusha commented 6 years ago

I'm seeing this exact behavior pulling updates from GitHub. The issue persists with lwip v2 Lower Memory. Has anyone figured out a workaround on this?

vshymanskyy commented 6 years ago

Facing the same issue.

gojimmypi commented 6 years ago

I'm seeing this same issue, however I think @davisonja is correct regarding being out of memory.

I'm using latest master branch build fetched today:

SDK:2.2.1(cfd48f3)/Core:win-2.5.0-dev/lwIP:2.0.3(STABLE-2_0_3_RELEASE/glue:arduino-2.4.1-13-g163bb82)/BearSSL:94e9704

I can fetch a JSON file via TLS/SSL once. However during that first fetch, I am building a linked list of objects (aka chewing up heap memory). The second time around when I fetch it again to update those objects, I get this error:

ssl->need_bytes=16448 > 6859
failed to grow plain buffer

Apparently I need 16,448 bytes of heap, but I have only 6,859. So I've inserted a bunch of Serial.Print(ESP.getFreeHeap()) statements, or more helpfully:

#define HEAP_DEBUG // when defined, display diagnostic heap info

//********************************************************
#ifdef HEAP_DEBUG
static const char *  HEAP_DEBUG_MSG = "Heap = ";
#define HEAP_DEBUG_PRINT(string)           (Serial.print  ( (string == DEFAULT_DEBUG_MESSAGE) ? (HEAP_DEBUG_MSG + (String)ESP.getFreeHeap()) : string ) )
#define HEAP_DEBUG_PRINTLN(string)         (Serial.println( (string == DEFAULT_DEBUG_MESSAGE) ? (HEAP_DEBUG_MSG + (String)ESP.getFreeHeap()) : string ) )
#define HEAP_DEBUG_PRINTF(string,uint32_t) (Serial.printf (  string,uint32_t)                                                   )
#endif

#ifndef HEAP_DEBUG
static const char *  HEAP_DEBUG_MSG = "";
#define HEAP_DEBUG_PRINT(string)           ((void)0)
#define HEAP_DEBUG_PRINTF(string)          ((void)0)
#define HEAP_DEBUG_PRINTLN(string)         ((void)0)
#endif

that can be used like this:

    Serial.begin(115200);
    HEAP_DEBUG_MSG = "mySetup; heap =";
    HEAP_DEBUG_PRINTLN("info for (begin)");
    HEAP_DEBUG_PRINTLN(DEFAULT_DEBUG_MESSAGE);

resulting in UART output like this:

info for (begin)
mySetup; heap = 37280

My plan is to write most of the object data to SPIFFS instead of using RAM. (and make sure I'm not wasting heap with duplicate object creations, etc)

I'm wondering if the amount of heap needed is dependent on the overall size of the TLS payload, or if the stream can be decrypted in a fixed amount of memory?

The default seems to be the axTLS; I thought I remember seeing BearSSL support coming soon. I wondering how to use that instead and if it uses less memory?

*edit: see also https://github.com/esp8266/Arduino/issues/1375#issuecomment-169519587 regarding buffer and key size

devyte commented 6 years ago

Bearssl is already merged, albeit experimental. Api is ~99% same as axtls. I.e. use bearssl::WiFiClientSecure instead of WiFiClientSecure. See examples for usage.

gojimmypi commented 6 years ago

aha! yes, but it is BearSSL::WiFiClientSecure (capital "B" and "SSL"; thanks @devyte for the reminder on looking at examples!)

Unfortunately for me, a straight-up replacement of the client declarations with the BearSLL type does not work. The sample works, and gives an interesting tutorial on proper validation of certs. Looks like perhaps the easiest method may be client.setFingerprint(fp) but I've not yet tried that in my code.

In any case there is still a memory leak problem in the axTLS as noted in this thread; It was not just my code using up a bunch of heap space. Although that certainly was part of my problem.

I'm currently working on a separate project to demonstrate the bug, however I have not yet been able to reproduce it in that tiny, controlled environment yet.

I have two solutions though:

The first is a brute-force replacement of the client when I am done fetching a file. This is certainly not the most graceful, however it is effective:

client.stopAll();
yield();
client == NULL;
WiFiClientSecure newClient;
client = newClient; 

The second I found in issue https://github.com/esp8266/Arduino/issues/4733 - specifically @5chufti 's sample code here: https://github.com/esp8266/Arduino/issues/4733#issuecomment-389829053 where @d-a-v creates a tcp cleanup:

struct tcp_pcb;
extern struct tcp_pcb* tcp_tw_pcbs;
extern "C" void tcp_abort (struct tcp_pcb* pcb);

void tcpCleanup ()
{
  while (tcp_tw_pcbs != NULL)
  {
    tcp_abort(tcp_tw_pcbs);
  }
}

and then use it like this:

client.stop();
yield();
tcpCleanup();

Both methods work for me. Both seem like a band-aid to an underlying problem. It would appear the default destructor properly cleans up (and closes the same tcp connections?) - but the client.stopAll() does not.

I'm not sure if garbage cleanup is needed with delay / yield. (?)

devyte commented 6 years ago

Bugs traced to axtls won't be fixed. Because of the long list of issues, and that it is no longer maintained, it is planned for deprecation and eventually will be retired. I strongly advise you to spend effort migrating to bearssl instead of demonstrating a bug in axtls.

The tcpcleanup shown by @d-a-v shouldn't be necessary when using latest git. Same goes for that mem leak. Are you using latest git, or latest release 2.4.1?

gojimmypi commented 6 years ago

@devyte thank you for that information! (you've saved me a ton of time)

Yes, I am using this git version:

SDK:2.2.1(cfd48f3)/Core:win-2.5.0-dev/lwIP:2.0.3(STABLE-2_0_3_RELEASE/glue:arduino-2.4.1-13-g163bb82)/BearSSL:94e9704

but the tcpcleanup cleanup is definitely needed (and works well) with the axTLS for my code. I will work on migrating my code to use BearSSL instead. Thanks again :)

earlephilhower commented 5 years ago

Closing as axtls related w/OOM issues.