espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.3k stars 7.35k forks source link

SSL - Internal error (eg, unexpected failure in lower-level module) #9995

Closed hitecSmartHome closed 1 month ago

hitecSmartHome commented 2 months ago

Board

ESP32-Wrover

Device Description

-

Hardware Configuration

-

Version

v2.0.17

IDE Name

PlatformIO

Operating System

Windows10

Flash frequency

80

PSRAM enabled

yes

Upload speed

115200

Description

SOMETIMES I got an SSL internal error when sending websocket frames. It mostly working and the packets are written fine. No matter the size because I'm chunking the data.

Sketch

void HsHWebsocket::send(const char* message, int messageLength) {
    if (!connected || messageLength <= 0) return;

    uint8_t headerSize;

    // Calculate header size
    if (messageLength < 126) {
        headerSize = 2;
    } else if (messageLength < 0xFFFF) {
        headerSize = 4;
    } else {
        headerSize = 10;
    }

    // Add mask key size if client
    headerSize += 4;

    // Generate mask key
    for (uint8_t i = 0; i < 4; ++i) {
        maskKey[i] = random(0x00, 0xFF);
    }

    // Create header
    fixedSendBuffer[0] = FIN | OPCODE_TEXT;
    if (messageLength < 126) {
        fixedSendBuffer[1] = 0x80 | messageLength;
    } else if (messageLength < 0xFFFF) {
        fixedSendBuffer[1] = 0x80 | 126;
        fixedSendBuffer[2] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[3] = messageLength & 0xFF;
    } else {
        fixedSendBuffer[1] = 0x80 | 127;
        fixedSendBuffer[2] = (messageLength >> 56) & 0xFF;
        fixedSendBuffer[3] = (messageLength >> 48) & 0xFF;
        fixedSendBuffer[4] = (messageLength >> 40) & 0xFF;
        fixedSendBuffer[5] = (messageLength >> 32) & 0xFF;
        fixedSendBuffer[6] = (messageLength >> 24) & 0xFF;
        fixedSendBuffer[7] = (messageLength >> 16) & 0xFF;
        fixedSendBuffer[8] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[9] = messageLength & 0xFF;
    }
    memcpy(fixedSendBuffer + headerSize - 4, maskKey, 4);

    // Send header
    if (client.write(fixedSendBuffer, headerSize) != headerSize) {
        log_e("Failed to send header\n");
        return;
    }

    // Send payload in chunks if necessary
    size_t chunkSize = MAX_FRAME_SIZE - headerSize;  // Adjust chunk size based on buffer size and header size
    size_t offset = 0;

    while (offset < messageLength) {
        size_t bytesToSend = min(chunkSize, messageLength - offset);

        // Mask the payload in chunks and send
        for (size_t i = 0; i < bytesToSend; ++i) {
            fixedSendBuffer[headerSize + i] = message[offset + i] ^ maskKey[(offset + i) % 4];
        }

        if (client.write(fixedSendBuffer + headerSize, bytesToSend) != bytesToSend) {
            log_e("Failed to send payload chunk\n");
            return;
        }

        offset += bytesToSend;
    }
}

Debug Message

[ 24635][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[ 24647][I][HsHWebsocketSend.cpp:122] sendPong(): Sending Pong

[ 24652][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 6 bytes...

[ 24655][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 675 bytes...

[ 24669][V][ssl_client.cpp:381] send_ssl_data(): Handling error -27648

[ 24676][E][ssl_client.cpp:37] _handle_error(): [send_ssl_data():382]: (-27648) SSL - Internal error (eg, unexpected                                                    failure in lower-level module)

[ 24689][E][ssl_client.cpp:37] _handle_error(): [data_to_read():361]: (-76) UNKNOWN ERROR CODE (004C)

[ 24690][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24705][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24709][E][HsHWebsocketSend.cpp:65] send(): Failed to send payload chunk

[ 24725][E][HsHWebsocketSend.cpp:48] send(): Failed to send header

Other Steps to Reproduce

Using WifiClientSecure

WiFiClientSecure secureClient;
WiFiClient normalClient;
WiFiClient& client = secureClient;

void HsHWebsocket::setClient() {
    if (port == 443) {
        secureClient.setInsecure();
        client = (WiFiClient&)secureClient;
        log_i("Using secure client");
    }
    log_i("Using normal client");
    client = normalClient;
}

I have checked existing issues, online documentation and the Troubleshooting Guide

me-no-dev commented 2 months ago

Why is it necessary to be one tick?

As far as I understand it, vTaskDelay(1); will delay until the next tick, which could happen to be so close that it does not give time for tasks to switch. At least something like that was explained to me. In Arduino 1 tick is 1ms, so your task will never delay more than 2ms if vTaskDelay(2); is used

hitecSmartHome commented 2 months ago

I have added the semaphore to my handle function and as it is expected, it froze the task. Never letting the send to complete.

me-no-dev commented 2 months ago

is sendText called anywhere from within handle()?

hitecSmartHome commented 2 months ago

No but sendpong() method is called from handle which also uses the send mutex to protect the client access when writing.

hitecSmartHome commented 2 months ago

Will try to place it outside of handle()

me-no-dev commented 2 months ago

yeah, taking and already taken mutex will freeze the task

me-no-dev commented 2 months ago

you could remove it from sendpong if it's not called from anywhere else. It would be already protected by handle

hitecSmartHome commented 2 months ago
case 0x09:  // Ping
            #if WS_DEBUG
                printf("Received ping frame.\n");
            #endif
            if (pingCallback) {
                pingCallback();
            }
            shouldSendPong = true;
            //sendPong();
            break;

and checking it before handle takes the mutex

void HsHWebsocket::handle() {
    if( shouldSendPong ){
        shouldSendPong = false;
        sendPong();
    }
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) != pdTRUE) {
        return;
    }
//...
hitecSmartHome commented 2 months ago

you could remove it from sendpong if it's not called from anywhere else. It would be already protected by handle

Yeah, but I plan to make it public so users can send a pong if they want. I did not think this trought yet... Will check it like this first.

hitecSmartHome commented 2 months ago

It crashes almost imidiately with the changes.

assert failed: 0x401d65cd
  #0  0x401d65cd in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)

Backtrace: 0x4008381e:0x3ffc58e0 0x401d107d:0x3ffc5900 0x4008fc4a:0x3ffc5920 0x401d65cd:0x3ffc5960 0x401d5f24:0x3ffc5980 0x400846a7:0x3ffc59a0 0x40085d55:0x3ffc59c0 0x40203a92:0x3ffc59e0 0x4020da0c:0x3ffc5a00 0x401b698f:0x3ffc5a20 0x401b5ef4:0x3ffc5a40 0x401b5fdd:0x3ffc5a60 0x401b5f48:0x3ffc5a80 0x40266581:0x3ffc5aa0 0x402646e7:0x3ffc5ad0 0x4014ce6f:0x3ffc5af0 0x4014a1a1:0x3ffc5b20
  #0  0x401d107d in esp_system_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system/esp_system.c:137
  #1  0x4008fc4a in __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib/assert.c:47
  #2  0x401d65cd in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)
  #3  0x401d5f24 in multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:212
      (inlined by) multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:200
  #4  0x400846a7 in heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_caps.c:382
  #5  0x40085d55 in esp_mbedtls_mem_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port/esp_mem.c:46
  #6  0x40203a92 in mbedtls_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/platform.c:54        
  #7  0x4020da0c in mbedtls_ssl_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/ssl_tls.c:6761   
  #8  0x401b698f in stop_ssl_socket(sslclient_context*, char const*, char const*, char const*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp:336
  #9  0x401b5ef4 in WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:98
  #10 0x401b5fdd in WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:248
  #11 0x401b5f48 in WiFiClientSecure::read(unsigned char*, unsigned int) at 
C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:213
  #12 0x40266581 in WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:257
  #13 0x402646e7 in HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp:103
  #14 0x4014ce6f in HsHWebsocket::handle() at src/HsHWebsocket/Handle.cpp:11  #15 0x4014a1a1 in HsHWebsocket::makeTask()::{lambda(void*)#1}::_FUN(void*) at src/HsHWebsocket/Connect.cpp:8
      (inlined by) _FUN at src/HsHWebsocket/Connect.cpp:13

ELF file SHA256: 5f115744b2ec6e08

Rebooting...
me-no-dev commented 2 months ago

it breaks here: https://github.com/espressif/esp-idf/blob/release/v4.4/components/heap/heap_tlsf.c#L965 which means as we suspected that it was already freed. But how does it happen if now code is even protected by mutex. Any chance some other public function is left unguarded and touches the client?

hitecSmartHome commented 2 months ago

Nop.

public:
    using Callback = std::function<void()>;
    using FrameCallback = std::function<void(char*, size_t)>;
    using ErrorCallback = std::function<void(WsError)>;
    using BinaryFrameCallback = std::function<void(const uint8_t*, size_t)>;

    HsHWebsocket();
    ~HsHWebsocket();

    void listen(const char* url, const char* path = "/");
    void listen(IPAddress ip, uint16_t port, const char* path = "/");

    /* Events */
    void onConnect(Callback callback);
    void onDisconnect(Callback callback);
    void onPing(Callback callback);
    void onPong(Callback callback);
    void onError(ErrorCallback callback);
    void onFrame(FrameCallback callback);
    void onBinaryFrame(BinaryFrameCallback callback);

    /* Utils */
    void addHeader(const char* key, const char* value);
    bool isConnected();

    /* Send */
    void sendText(const char* message, int messageLength);
    void sendText(const char* message);

   private:

Also I'm checking if it is already initialised with a flag so it can't be initialised twice.

hitecSmartHome commented 2 months ago

Oh wait a minute. isConnected() is public. Will check.

hitecSmartHome commented 2 months ago

Yeah, isConnected() is used outside. The tasks that are using the websocket implementation are also checking if we are connected or not in various situations.

bool HsHWebsocket::isConnected() {
    return client.connected() && connected;
}
hitecSmartHome commented 2 months ago

But if i protect client.connected() with the same mutex, it will also trigger a lock.

hitecSmartHome commented 2 months ago

client should be protected internally....

me-no-dev commented 2 months ago

maybe, but for now you need to figure out the locks and make sure that you do not call it concurently

me-no-dev commented 2 months ago

maybe protect isConnected and expand it when inside protected area, or create private _isConnected that is unprotected and you can use inside protected areas