espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.23k stars 7.34k forks source link

SSL - Internal error (eg, unexpected failure in lower-level module) #9995

Closed hitecSmartHome closed 3 weeks ago

hitecSmartHome commented 1 month ago

Board

ESP32-Wrover

Device Description

-

Hardware Configuration

-

Version

v2.0.17

IDE Name

PlatformIO

Operating System

Windows10

Flash frequency

80

PSRAM enabled

yes

Upload speed

115200

Description

SOMETIMES I got an SSL internal error when sending websocket frames. It mostly working and the packets are written fine. No matter the size because I'm chunking the data.

Sketch

void HsHWebsocket::send(const char* message, int messageLength) {
    if (!connected || messageLength <= 0) return;

    uint8_t headerSize;

    // Calculate header size
    if (messageLength < 126) {
        headerSize = 2;
    } else if (messageLength < 0xFFFF) {
        headerSize = 4;
    } else {
        headerSize = 10;
    }

    // Add mask key size if client
    headerSize += 4;

    // Generate mask key
    for (uint8_t i = 0; i < 4; ++i) {
        maskKey[i] = random(0x00, 0xFF);
    }

    // Create header
    fixedSendBuffer[0] = FIN | OPCODE_TEXT;
    if (messageLength < 126) {
        fixedSendBuffer[1] = 0x80 | messageLength;
    } else if (messageLength < 0xFFFF) {
        fixedSendBuffer[1] = 0x80 | 126;
        fixedSendBuffer[2] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[3] = messageLength & 0xFF;
    } else {
        fixedSendBuffer[1] = 0x80 | 127;
        fixedSendBuffer[2] = (messageLength >> 56) & 0xFF;
        fixedSendBuffer[3] = (messageLength >> 48) & 0xFF;
        fixedSendBuffer[4] = (messageLength >> 40) & 0xFF;
        fixedSendBuffer[5] = (messageLength >> 32) & 0xFF;
        fixedSendBuffer[6] = (messageLength >> 24) & 0xFF;
        fixedSendBuffer[7] = (messageLength >> 16) & 0xFF;
        fixedSendBuffer[8] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[9] = messageLength & 0xFF;
    }
    memcpy(fixedSendBuffer + headerSize - 4, maskKey, 4);

    // Send header
    if (client.write(fixedSendBuffer, headerSize) != headerSize) {
        log_e("Failed to send header\n");
        return;
    }

    // Send payload in chunks if necessary
    size_t chunkSize = MAX_FRAME_SIZE - headerSize;  // Adjust chunk size based on buffer size and header size
    size_t offset = 0;

    while (offset < messageLength) {
        size_t bytesToSend = min(chunkSize, messageLength - offset);

        // Mask the payload in chunks and send
        for (size_t i = 0; i < bytesToSend; ++i) {
            fixedSendBuffer[headerSize + i] = message[offset + i] ^ maskKey[(offset + i) % 4];
        }

        if (client.write(fixedSendBuffer + headerSize, bytesToSend) != bytesToSend) {
            log_e("Failed to send payload chunk\n");
            return;
        }

        offset += bytesToSend;
    }
}

Debug Message

[ 24635][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[ 24647][I][HsHWebsocketSend.cpp:122] sendPong(): Sending Pong

[ 24652][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 6 bytes...

[ 24655][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 675 bytes...

[ 24669][V][ssl_client.cpp:381] send_ssl_data(): Handling error -27648

[ 24676][E][ssl_client.cpp:37] _handle_error(): [send_ssl_data():382]: (-27648) SSL - Internal error (eg, unexpected                                                    failure in lower-level module)

[ 24689][E][ssl_client.cpp:37] _handle_error(): [data_to_read():361]: (-76) UNKNOWN ERROR CODE (004C)

[ 24690][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24705][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24709][E][HsHWebsocketSend.cpp:65] send(): Failed to send payload chunk

[ 24725][E][HsHWebsocketSend.cpp:48] send(): Failed to send header

Other Steps to Reproduce

Using WifiClientSecure

WiFiClientSecure secureClient;
WiFiClient normalClient;
WiFiClient& client = secureClient;

void HsHWebsocket::setClient() {
    if (port == 443) {
        secureClient.setInsecure();
        client = (WiFiClient&)secureClient;
        log_i("Using secure client");
    }
    log_i("Using normal client");
    client = normalClient;
}

I have checked existing issues, online documentation and the Troubleshooting Guide

hitecSmartHome commented 1 month ago

Sometimes there is that debug log

[165950][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 6 bytes...

[165959][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 444 bytes...

[165971][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[165980][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 444 bytes...

[165992][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[165996][E][ssl_client.cpp:37] _handle_error(): [data_to_read():361]: (-30592) SSL - A fatal alert message was received from our peer

[166012][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[166022][V][ssl_client.cpp:381] send_ssl_data(): Handling error -78

[166028][E][ssl_client.cpp:37] _handle_error(): [send_ssl_data():382]: (-78) UNKNOWN ERROR CODE (004E)

[166037][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[166044][E][HsHWebsocketSend.cpp:48] send(): Failed to send header
hitecSmartHome commented 1 month ago

There is even a crash sometimes

assert failed: 0x401dba51

Backtrace: 0x4008381e:0x3ffcda10 0x401d6501:0x3ffcda30 0x4008fc4a:0x3ffcda50 0x401dba51:0x3ffcda90 0x401db3a8:0x3ffcdab0 0x400846a7:0x3ffcdad0 0x4008fc7d:0x3ffcdaf0 0x400d7bd9:0x3ffcdb10 0x40268d12:0x3ffcdb30 0x400d7c51:0x3ffcdb50 0x4012161f:0x3ffcdb70 0x40121629:0x3ffcdbf0 0x400d4a3e:0x3ffcdc10 0x4018d235:0x3ffcdc30 0x40189e5e:0x3ffcdc80 0x40189e85:0x3ffcdca0

ELF file SHA256: e485fe8d165cb361

Rebooting...

DECODED

Decoding stack results
0x4008381e: panic_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c line 408
0x401d6501: esp_register_shutdown_handler at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c line 60
0x4008fc4a: __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\assert.c line 47
0x401dba51: tlsf_malloc at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf_block_functions.h line 126
0x401db3a8: multi_heap_malloc_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c line 191
0x400846a7: heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c line 382
0x4008fc7d: free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\heap.c line 39
0x400d7bd9: ArduinoJson::V704PB22::detail::DefaultAllocator::deallocate(void*) at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/Allocator.hpp line 31
0x40268d12: ArduinoJson::V704PB22::detail::StringPool::dereference(char const*, ArduinoJson::V704PB22::Allocator*) at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/StringPool.hpp line 88
0x400d7c51: ArduinoJson::V704PB22::detail::ResourceManager::~ResourceManager() at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/ResourceManager.hpp line 26
0x4012161f: Thermostat:: ::operator()(void) const at lib/ArduinoJson-7.x/src/ArduinoJson/Object/MemberProxy.hpp line 32
0x40121629: Thermostat:: ::operator()(void) const at src/Components/Thermostat/Thermostat.cpp line 156
0x400d4a3e: std::function ::operator()() const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x4018d235: Sys::handleIntervals() at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_iterator.h line 783
0x40189e5e: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0/new line 169
0x40189e85: std::_Function_handler  ::  >::_M_invoke(const std::_Any_data &, TmParams *&&) at src/System/System.cpp line 314

TLS implementation has serious faults it seems.

hitecSmartHome commented 1 month ago

Random crashes when i call client.stop()

==================== CURRENT THREAD STACK =====================
#0  0x40083821 in panic_abort (details=0x3ffc5900 "assert failed: 0x401d3ac1") at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c:408
#1  0x401cc6c4 in esp_system_abort (details=0x3ffc5900 "assert failed: 0x401d3ac1") at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c:137
#2  0x4008fc4d in __assert_func (file=<optimized out>, line=0, func=0x0, expr=0x0) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components
ewlib\assert.c:47
#3  0x401d3ac4 in tlsf_free (tlsf=0x3f80383c, ptr=<optimized out>) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_tlsf.c:965
#4  0x401d341b in multi_heap_free_impl (p=0x3f82cbbc, heap=0x3f803828) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c:212
#5  multi_heap_free_impl (heap=0x3f803828, p=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c:200
#6  0x400846aa in heap_caps_free (ptr=<optimized out>) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c:382
#7  0x40085d58 in esp_mbedtls_mem_free (ptr=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port\esp_mem.c:46
#8  0x40200f85 in mbedtls_free (ptr=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\platform.c:54
#9  0x4020aa1b in mbedtls_ssl_free (ssl=0x3f80485c) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_tls.c:6749
#10 0x401b409e in stop_ssl_socket (ssl_client=0x3f804858, rootCABuff=<optimized out>, cli_cert=0x0, cli_key=0x0) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp:336
#11 0x401b3603 in WiFiClientSecure::stop (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:98
#12 0x401b36ec in WiFiClientSecure::available (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:248
#13 0x401b3657 in WiFiClientSecure::read (this=0x3ffb38a0 <serverRouter+12>, buf=0x3ffc5a80 "", size=0) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:213
#14 0x40263a1d in WiFiClientSecure::connected (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:257
#15 0x4014928f in HsHWebsocket::handleWebSocket (this=0x3ffb3894 <serverRouter>) at src/HsHWebsockets/HandleWebsocket.cpp:132
#16 0x4014b37f in <lambda(void*)>::operator() (__closure=0x0, param=0x3ffb3894 <serverRouter>) at src/HsHWebsockets/HsHWebsocketConnection.cpp:37
#17 <lambda(void*)>::_FUN(void *) () at src/HsHWebsockets/HsHWebsocketConnection.cpp:39
me-no-dev commented 1 month ago

looks like memory issue. Both exceptions happen when dealing with memory. Also the code you provided is only illustration and not minimal code that we can actually compile and reproduce.

Have you tried 3.x? 2.x is EOL (even if we release 2.0.18 with the very last ESP-IDF 4.4.8)

hitecSmartHome commented 1 month ago

Unfortunatelly I'm stuck at 2.0.17 because I use Arduino as a component of IDF with PlatformIO. I need the full power of the menuconfig so I can adjust the external ram options to my needs.

Indeed it is a memory issue which happens when I call client.stop(). My app constantly sending messages and randomly crashes. Probably because when it calls client.stop() it is still in a sending state and wants to use client. I'm using a semaphore to protect the WiFiClient secure pointer but it did not work.

hitecSmartHome commented 1 month ago

It seems to me so far that not the client.stop() call was the bad guys but the client.connected() call. I removed this from my project and no crash so far. Will test further

hitecSmartHome commented 1 month ago

My test cpus are up and running since 16 hour and 49 min without the client.connected() call. Looks like it does not like concurrent calls from other threads

me-no-dev commented 1 month ago

client.connected() goes through read() with size 0 to update the client's _connected variable, which goes through available(), which can in particular case call stop(), which would update the _connected variable to false, that can then be returned by connected(). So far it makes sense. What is your application doing with connected()? You mention multiple threads?

me-no-dev commented 1 month ago

maybe connected() should read() only if _connected is true. Can you try that?

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}
hitecSmartHome commented 1 month ago

My application is a websocket client implementation written in top of the WifiClientSecure and WiFiClient classes because the ArduinoWebsockets library and the native IDF ws client library ( on idf 4.4.7 ) is sucks. They both causing crashes or slow. This websocket client tries to be connected to the server at all times. For this to work I attempted to check for connectivity on every iteration with this call. client.connected(). Which I tought is used for exactly this purpose. I wanted to initiate a reconnection after x sec if the client is not connected. Instead I turned to an other approach where the application ping-ponging with the server and if we can't send a messgae ( client.write() fails ) we initiate the reconnection.

Really interesting design choise for client.connected(). I really just wanted to know if we are still connected or not.

maybe connected() should read() only if _connected is true. Can you try that?

I can try that yes. I have other processors to test it along with the rest. I can upload my ws client implementation so you can take a look if you want to see the use case but basically this is the function which runs in a loop.

void HsHWebsocket::handle() {
    if (!isConnected()) {
        disconnect();
        connect();
        return;
    }

    if (!client.available()) { return; }

    uint8_t opcode;
    uint64_t payloadLength;
    bool fin, masked;

    if (!readFrameHeader(opcode, payloadLength, fin, masked)) {
        return;
    }

    if (masked && !fragmentedMessage) {
        if (!readMaskingKey(maskKey)) {
            return;
        }
    }

    if (!readPayloadData(payloadLength, masked, maskKey)) {
        return;
    }

    if (fin) {
        processFrame(opcode);
    } else {
        fragmentedMessage = true;
    }
}

In the isConnected() call there was the client.connected() call like this

bool HsHWebsocket::isConnected() {
    return connected && client.connected();
}

This is instead is like this

bool HsHWebsocket::isConnected() {
    return connected;
}

I have an internal bool variable for tracking the connected state.

hitecSmartHome commented 1 month ago

maybe connected() should read() only if _connected is true. Can you try that?

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}

Should I replace this function entierly?

uint8_t WiFiClient::connected()
{
    if (_connected) {
        uint8_t dummy;
        int res = recv(fd(), &dummy, 0, MSG_DONTWAIT);
        // avoid unused var warning by gcc
        (void)res;
        // recv only sets errno if res is <= 0
        if (res <= 0){
          switch (errno) {
              case EWOULDBLOCK:
              case ENOENT: //caused by vfs
                  _connected = true;
                  break;
              case ENOTCONN:
              case EPIPE:
              case ECONNRESET:
              case ECONNREFUSED:
              case ECONNABORTED:
                  _connected = false;
                  log_d("Disconnected: RES: %d, ERR: %d", res, errno);
                  break;
              default:
                  log_i("Unexpected: RES: %d, ERR: %d", res, errno);
                  _connected = true;
                  break;
          }
        } else {
          _connected = true;
        }
    }
    return _connected;
}
me-no-dev commented 1 month ago

But you said you use ClientSecure. Why post the function of Client? You need to change the WiFiClientSecure::connected()

hitecSmartHome commented 1 month ago

Oh sorry. The implementation is written so it can use both. I happened to click on the WifiClient and not the WiFiClientSecure. There is a listen() method which takes in either a URL in a string format or an ip address. The library attempts to determine if it is a secure or a non secure connection and using either client or clientSecure depending on the http method.

WiFiClientSecure secureClient;
WiFiClient normalClient;
WiFiClient& client = secureClient;

void HsHWebsocket::setClient() {
    if (isSecure()) {
        secureClient.setInsecure(); // Development mode
        client = (WiFiClient&)secureClient;
        return;
    }
    client = normalClient;
}
me-no-dev commented 1 month ago

The crash was in client secure, so I looked into that and say that it reads always, even if already known to be disconnected. WiFiClient::connected() does the same thing

hitecSmartHome commented 1 month ago

Really strange that it can call stop() when I only want to check if it is connected or not. :D Will check

me-no-dev commented 1 month ago

just change it to

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}
hitecSmartHome commented 1 month ago

Changed, testing.

me-no-dev commented 1 month ago

Really strange that it can call stop() when I only want to check if it is connected or not.

It's because it does not really directly check if connected, instead it tries to read 0 bytes from the client, to indirectly determine the state of the connection (WiFiClient does the same) and in some cases it should close the connection, so stop() is called. Issue is that it was called multiple times

hitecSmartHome commented 1 month ago

Crashed.

PC: 0x4020c044
EXCVADDR: 0x00000000

Decoding stack results
0x4020c041: mbedtls_ssl_read_record at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_msg.c line 3427
0x4020c302: mbedtls_ssl_read at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_msg.c line 5236
0x401b6a7e: data_to_read(sslclient_context*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp line 356
0x401b6075: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 246
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103
0x40154764: HsHWebsocket::sendText(char const*, int) at src/HsHWebsocket/Send.cpp line 113
0x4013f955: HsHServerRouter::reply(ArduinoJson::V704PB22::JsonDocument, bool) at src/HsHServerRouter/Handlers.cpp line 15
0x4013f9e3: HsHServerRouter::redirect(char const*, char const*, char const*) at src/HsHServerRouter/Handlers.cpp line 27
0x401747dd: HsHServer::sendEvent(char const*, ArduinoJson::V704PB22::JsonDocument&) at src/Server/Event.cpp line 66
0x40121ef6: Thermostat:: ::operator()(void) const at src/Components/Thermostat/Thermostat.cpp line 156
0x40121f05: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 151
0x400d4732: std::function ::operator()() const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x40192235: Sys::handleIntervals() at src/System/SystemInterval.cpp line 55
0x4018bbda: Sys::loop() at src/System/System.cpp line 549
0x4018bc01: systemTask(void*) at src/System/System.cpp line 558
me-no-dev commented 1 month ago

are you sure you replaced the proper file? connected should have shown line 258

me-no-dev commented 1 month ago

C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257

me-no-dev commented 1 month ago

maybe clean the build, since you use PIO

hitecSmartHome commented 1 month ago

are you sure you replaced the proper file? connected should have shown line 258

Yeah,

image

hitecSmartHome commented 1 month ago

Cleaned it, trying again.

hitecSmartHome commented 1 month ago

I can trigger this crash when the server rapidly sending messages to the esp32. I have a range slider on the server's frontend. I can move it rapidly and it crashes eventually. It can crash without me rapidly moving the range slider hovewer, but it causes it to crash much faster. It does not crash at all no matter how crazy I move the slider if this call is commented out.

hitecSmartHome commented 1 month ago

Cleaned, uploaded again ( via OTA ), crashed again.

Decoding stack results
0x4008381e: panic_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c line 408
0x401d1125: esp_system_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c line 137
0x4008fc4a: __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\assert.c line 47
0x401d6675: tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf_block_functions.h line 90
0x401d5fcc: multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c line 212
0x400846a7: heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c line 382
0x40085d55: esp_mbedtls_mem_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port\esp_mem.c line 46
0x40203b3a: mbedtls_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\platform.c line 54
0x4020dab4: mbedtls_ssl_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_tls.c line 6761
0x401b6a37: stop_ssl_socket(sslclient_context*, char const*, char const*, char const*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp line 336
0x401b5f9c: WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 98
0x401b6085: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 248
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103
0x401411e1: HsHServerRouter::connected() at src/HsHServerRouter/HsHServerRouter.cpp line 125
0x4013f969: HsHServerRouter::redirect(char const*, char const*, char const*) at src/HsHServerRouter/Handlers.cpp line 21
0x401747dd: HsHServer::sendEvent(char const*, ArduinoJson::V704PB22::JsonDocument&) at src/Server/Event.cpp line 66
0x400f02f1: Heater::sendSync() at src/Components/Heater/Utils.cpp line 53
0x400ec219: std::_Function_handler   >::_M_invoke(const std::_Any_data &, int &&, int &&) at src/Components/Heater/Heater.cpp line 11
0x4019848a: std::function ::operator()(int, int) const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x4019859d: Time::emitSecChange(int, int) at src/Time/TimeEvents.cpp line 69
0x401985bd: Time::monitorSecChange() at src/Time/TimeEvents.cpp line 331
0x40198721: Time::handleEvents() at src/Time/TimeEvents.cpp line 369
0x40197522: Time::loop() at src/Time/Time.cpp line 211
0x401a0968: loop() at src/main.cpp line 95
0x401c1289: loopTask(void*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp line 50

Always these lines

0x401b5f9c: WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 98
0x401b6085: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 248
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103
me-no-dev commented 1 month ago

what is also interesting is that even if you comment the call, after isConnected you call client.available(), which should have ended the same way, but you say it does not crash...

me-no-dev commented 1 month ago

Also please also post the message of the exception.(first line)

hitecSmartHome commented 1 month ago

Thank you very much for the help so far.

Wel, client.available() does not stop the client in any case. It is probably the fact that the connected() method calls stop() on some condition and my application tries to access it because it does not know that the underlying code freed all resources.

Also please also post the message of the exception.(first line)

What do you mean by that? It is all the decoded stack result. Do you need to undecoded exception?

assert failed: 0x401d6675

Backtrace: 0x4008381e:0x3ffcadc0 0x401d1125:0x3ffcade0 0x4008fc4a:0x3ffcae00 0x401d6675:0x3ffcae40 0x401d5fcc:0x3ffcae60 0x400846a7:0x3ffcae80 0x40085d55:0x3ffcaea0 0x40203b3a:0x3ffcaec0 0x4020dab4:0x3ffcaee0 0x401b6a37:0x3ffcaf00 0x401b5f9c:0x3ffcaf20 0x401b6085:0x3ffcaf40 0x401b5ff0:0x3ffcaf60 0x40266629:0x3ffcaf80 0x4026478f:0x3ffcafb0 0x401411e1:0x3ffcafd0 0x4013f969:0x3ffcaff0 0x401747dd:0x3ffcb0a0 0x400f02f1:0x3ffcb0d0 0x400ec219:0x3ffcb140 0x4019848a:0x3ffcb160 0x4019859d:0x3ffcb190 0x401985bd:0x3ffcb1d0 0x40198721:0x3ffcb1f0 0x40197522:0x3ffcb210 0x401a0968:0x3ffcb230 0x401c1289:0x3ffcb250

ELF file SHA256: 7d89e57a874e79d7

Rebooting...
hitecSmartHome commented 1 month ago

This is the complete error message when I open the serial with decoding

assert failed: 0x401d6675
  #0  0x401d6675 in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)

Backtrace: 0x4008381e:0x3ffcd740 0x401d1125:0x3ffcd760 0x4008fc4a:0x3ffcd780 0x401d6675:0x3ffcd7c0 0x401d5fcc:0x3ffcd7e0 0x400846a7:0x3ffcd800 0x40085d55:0x3ffcd820 0x40203b3a:0x3ffcd840 0x4020da9c:0x3ffcd860 0x401b6a37:0x3ffcd880 0x401b5f9c:0x3ffcd8a0 0x401b6085:0x3ffcd8c0 0x401b5ff0:0x3ffcd8e0 0x40266629:0x3ffcd900 0x4026478f:0x3ffcd930 0x401411e1:0x3ffcd950 0x4017478b:0x3ffcd970 0x40121ef6:0x3ffcd9a0 0x40121f05:0x3ffcda20 0x400d4732:0x3ffcda40 0x40192235:0x3ffcda60 0x4018bbda:0x3ffcdab0 0x4018bc01:0x3ffcdad0
  #0  0x401d1125 in esp_system_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system/esp_system.c:137
  #1  0x4008fc4a in __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib/assert.c:47
  #2  0x401d6675 in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)
  #3  0x401d5fcc in multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:212
      (inlined by) multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:200
  #4  0x400846a7 in heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_caps.c:382
  #5  0x40085d55 in esp_mbedtls_mem_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port/esp_mem.c:46
  #6  0x40203b3a in mbedtls_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/platform.c:54        
  #7  0x4020da9c in mbedtls_ssl_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/ssl_tls.c:6749   
  #8  0x401b6a37 in stop_ssl_socket(sslclient_context*, char const*, char const*, char const*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp:336
  #9  0x401b5f9c in WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:98
  #10 0x401b6085 in WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:248
  #11 0x401b5ff0 in WiFiClientSecure::read(unsigned char*, unsigned int) at 
C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:213
  #12 0x40266629 in WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:257
  #13 0x4026478f in HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp:103
  #14 0x401411e1 in HsHServerRouter::connected() at src/HsHServerRouter/HsHServerRouter.cpp:125
  #15 0x4017478b in HsHServer::sendEvent(char const*, ArduinoJson::V704PB22::JsonDocument&) at src/Server/Event.cpp:58 (discriminator 1)
  #16 0x40121ef6 in _ZZN10Thermostat16sendPeriodicDataEvENKUlvE_clEv$isra$378 at src/Components/Thermostat/Thermostat.cpp:156
  #17 0x40121f05 in std::_Function_handler<void (), Thermostat::sendPeriodicData()::{lambda()#1}>::_M_invoke(std::_Any_data const&) at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:297
  #18 0x400d4732 in std::function<void ()>::operator()() const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h:687
  #19 0x40192235 in Sys::handleIntervals() at src/System/SystemInterval.cpp:55
  #20 0x4018bbda in Sys::loop() at src/System/System.cpp:549
  #21 0x4018bc01 in systemTask(void*) at src/System/System.cpp:558 (discriminator 1)

ELF file SHA256: 7d89e57a874e79d7

Rebooting...
me-no-dev commented 1 month ago

Wel, client.available() does not stop the client in any case. It is probably the fact that the connected() method calls stop() on some condition and my application tries to access it because it does not know that the underlying code freed all resources.

connected() calls read(), which calls available(), which calls stop(), so available() is the one actually calling stop(). That is why I am not getting why the next call to available() does not trigger it

hitecSmartHome commented 1 month ago

Oh, yeah. You'll right

int WiFiClientSecure::available()
{
    int peeked = (_peek >= 0);
    if (!_connected) {
        return peeked;
    }
    int res = data_to_read(sslclient);
    if (res < 0) {
        stop();
        return peeked?peeked:res;
    }
    return res+peeked;
}

Really interesting...

me-no-dev commented 1 month ago
void HsHWebsocket::handle() {
    if (!isConnected()) { // <<< FAILS
        disconnect();
        connect();
        return;
    }

    if (!client.available()) { return; } // <<< PASSES
me-no-dev commented 1 month ago

do you call handle() from multiple threads?

hitecSmartHome commented 1 month ago

No. Its in it's own thread and called in a loop.

void HsHWebsocket::makeTask() {
    xTaskCreate([](void* param) {
        static_cast<HsHWebsocket*>(param)->connect();
        while (true){
            if( !hshSystem.isFirmwareInProgress() ){
                static_cast<HsHWebsocket*>(param)->handle();
            }
            vTaskDelay(1);
        }
        vTaskDelete(nullptr);
    },"HsHWsTask", WS_TASK_STACK, this, WS_TASK_PRIORITY, &taskHandle);
}

I hope makeTask() is not called twice. Will check and make sure.

me-no-dev commented 1 month ago

if you deal with the client in a single task (assuming was not started twice), then you do not have a multitasking problem. If you access the client from another task, though, it could be a problem.

hitecSmartHome commented 1 month ago

Well, the implementation has an exposed sendText method which can be called from any other task. It is guarded hovewer with semaphores.

void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (!isConnected() || !message || messageLength <= 0) return;
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        send(message, messageLength);
        xSemaphoreGive(sendMutex);
    }
}

void HsHWebsocket::sendText(const char* message) {
    if (!isConnected() || !message) return;
    sendText(message, strlen(message));
}

These methods are calling an internal private send method which can access the client to write to it.

void HsHWebsocket::send(const char* message, int messageLength) {
    if (!isConnected() || !message || messageLength <= 0) return;

    uint8_t headerSize = messageLength < 126? 2 : messageLength < 0xFFFF? 4 : 10;
    // Add mask key size if client
    headerSize += 4;
    // Generate mask key
    for (uint8_t i = 0; i < 4; ++i) {
        maskKey[i] = random(0x00, 0xFF);
    }

    // Create header
    createHeader(headerSize, messageLength);

    // Write the header
    if (client.write(sendBuffer, headerSize) != headerSize) {
        #if WS_DEBUG
            printf("[WS] - Failed to send header\n");
        #endif
        softDisconnect();
        return;
    }

    // Send payload in chunks if necessary
    size_t chunkSize = MAX_FRAME_SIZE - headerSize;
    size_t offset = 0;

    while (offset < messageLength) {
        size_t bytesToSend = min(chunkSize, messageLength - offset);

        // Mask the payload in chunks and send
        for (size_t i = 0; i < bytesToSend; ++i) {
            sendBuffer[headerSize + i] = message[offset + i] ^ maskKey[(offset + i) % 4];
        }

        if (client.write(sendBuffer + headerSize, bytesToSend) != bytesToSend) {
            log_e("Failed to send payload chunk\n");
            return;
        }

        offset += bytesToSend;
    }

    client.flush();
}

These methods also calling isConnected() before any action

hitecSmartHome commented 1 month ago

Here is an untested un compiled repo for testing https://github.com/hitecSmartHome/WsCrashTest Will check it if it compiles at all.

hitecSmartHome commented 1 month ago

Just need a server which sends messages rapidly.

hitecSmartHome commented 1 month ago

This is the basic idea for the usecase

#include <Arduino.h>
#include <HsHWebsocket/HsHWebsocket.h>

#define SERVER_URL "https://echo.websocket.org/"

void setup() {
  Serial.begin(115200);

  webSocket.onConnect([]() {
    Serial.println("WS - Connected");
  });

  webSocket.onDisconnect([]() {
    Serial.println("WS - Disconnected");
  });

  webSocket.onFrame([](char* data, size_t length) {
    Serial.printf("WS - Received: %s\n", data);
  });

  webSocket.onError([](WsError error) {
    Serial.printf("WS - Error: %d. Message: %s\n", error.code, error.message);
  });

  webSocket.listen(SERVER_URL);
  // webSocket.listen(SERVER_URL, "/path");
  // webSocket.listen(IPAddress(192, 168, 0, 1), 80, "/path");
  // webSocket.listen(IPAddress(192, 168, 0, 1), 80);
}

void loop() {
  vTaskDelay(1);
}
me-no-dev commented 1 month ago

Try this:

void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
            xSemaphoreGive(sendMutex);
        }
    }
}

void HsHWebsocket::sendText(const char* message) {
    sendText(message, strlen(message));
}
me-no-dev commented 1 month ago

Your handle() should be guarded by the same semaphore.

hitecSmartHome commented 1 month ago

If I guard the handle with the same semaphore my esp is frozen. IDK why. I tried that. Will check your suggestions!

hitecSmartHome commented 1 month ago
void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
            xSemaphoreGive(sendMutex);
        }
    }
}

This would never give the semaphore if the message is either empty or the server is not connected. It would stuck

me-no-dev commented 1 month ago

ahh sorry..... typo. xSemaphoreGive(sendMutex); should be one level above

me-no-dev commented 1 month ago
void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
        }
        xSemaphoreGive(sendMutex);
    }
}
me-no-dev commented 1 month ago
void HsHWebsocket::handle() {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) != pdTRUE) {
        return;
    }
    if (!isConnected()) {
        disconnect();
        connect();
        goto return_sem;
    }

    if (!client.available()) { goto return_sem; }

    uint8_t opcode;
    uint64_t payloadLength;
    bool fin, masked;

    if (!readFrameHeader(opcode, payloadLength, fin, masked)) {
        goto return_sem;
    }

    if (masked && !fragmentedMessage) {
        if (!readMaskingKey(maskKey)) {
            goto return_sem;
        }
    }

    if (!readPayloadData(payloadLength, masked, maskKey)) {
        goto return_sem;
    }

    if (fin) {
        processFrame(opcode);
    } else {
        fragmentedMessage = true;
    }
return_sem:
    xSemaphoreGive(sendMutex);
}
me-no-dev commented 1 month ago

You might also want to replace all vTaskDelay(1); with vTaskDelay(2); to ensure that there will be at least one tick for tasks to be switched

hitecSmartHome commented 1 month ago

hm... Why is it necessary to be one tick? I use vTaskDelay(1); for all my tasks. ( I have a lot of tasks ) Is this bad practise? Really interesting that you are using goto.

hitecSmartHome commented 1 month ago

Thank you very much for the suggestions. Will put client.connected() back and implement your changes and test it.