espressif / esp-aws-iot

AWS IoT SDK for ESP32 based chipsets
Apache License 2.0
256 stars 154 forks source link

esp-tls-mbedtls: read error :-0x004C: (CA-265) #160

Open law-ko opened 1 year ago

law-ko commented 1 year ago

Hello,

It seems like this error (i.e. esp-tls-mbedtls: read error :-0x004C:) is quite common when trying to keep the TLS connection with AWS IoT, and it also seems to relate to MQTT_PROCESS_LOOP_TIMEOUT_MS timeout.

In the example project, it was set to 1500U and it seems to have less failure rate of the esp-tls-mbedtls: read error :-0x004C: error. However, this slows down the initial connecting time and also does not seemed to resolve the issue directly, which the value is now set to 100U.

Is there any way we can track and detect esp-tls-mbedtls: read error :-0x004C: happening and attempt to reestablish the TLS connection to AWS IoT?

The heap is around 65000 which should still be sufficient for TLS connection to maintain.

Thank you.

SolidStateLEDLighting commented 1 year ago

The sample IOT applications are weak about how they demonstrate the concepts. They are poor patterns to follow.

You will see them calling MQTT_ProcessLoop(&_mqttContext, timeoutVar) only during specific actions. That works for a demo -- but not in practice. In real life, you must circulate through the ProcessLoop continuously so the MQTT processing stack will generate and handle all the keep alive PING/PINGRESP messages. I think that is going to solve your problem. If you are not circulating through MQTT_ProcessLoop() during idle -- then you absolutely will have a time-out disconnection problem.

law-ko commented 1 year ago

Hi @SolidStateLEDLighting ,

Thank you for your response. Yes, the MQTT_ProcessLoop is being circulated in a while loop at the end of the aws shadow entry callback.

while (true)
{
    /* Loop to receive packet from transport interface. */
    MQTTStatus_t mqttStatus = MQTT_ProcessLoop( &mqttContext, MQTT_PROCESS_LOOP_TIMEOUT_MS );

    /* --- ASYNC MQTT_Publish --- */
    if( mqttStatus != MQTTSuccess )
    {
        LogWarn( ( "MQTT_ProcessLoop returned with status = %u.",
                    mqttStatus ) );
    } else if (mqttStatus == MQTTSuccess)
    {
        LogInfo( ( "MQTT_ProcessLoop success" ) );

        /* Asynchronous updates - not instant (mainly for sensors) */
        update_t updateType;
        if (updateForAWS)
        {
            if (xQueueReceive(xAWSQueue, &updateType, portMAX_DELAY))
            {
                /* Perform updates */
            }
        }
    }
}

However, it still has the issue 004C and will show up more frequent if MQTT_PROCESS_LOOP_TIMEOUT_MS is lowered.

dhavalgujar commented 1 year ago

Hi @law-ko, We will look into your specific issue.

Meanwhile, as @SolidStateLEDLighting rightly mentioned, the demo examples are not intended to provide a pattern that should be followed for production scenarios where you will need to call MQTT_ProcessLoop repeatedly.

We offer a production-ready example that showcases the usage of the coreMQTT-Agent library to solve the issue that you are facing, by moving MQTTAgent_CommandLoop to its own task. Please refer to my comment on issue #89 and also check #134 for more details on this.

The reference example also includes a task, prvCoreMqttAgentConnectionTask, which handles connecting/reconnecting the TLS and MQTT connection.

SolidStateLEDLighting commented 1 year ago

I assume you are handling incoming messages in an event_callback() handler?

Incidentally, my stack size is set to 28k. That is enough to handle all IOT services. I'm not sure what the minimum would be.

I set the MQTT_ProcessLoop time out to zero and it still takes at least 200 or 300mS to get through there. My test device communicates at a great distance from my AWS region.

SolidStateLEDLighting commented 1 year ago

One more comment that might be significant here....

Looking at your while loop.... I'm seeing that you are indiscriminately moving through the MQTT_ProcessLoop() but not specifically reacting to incoming message before you go through the MQTT_ProcessLoop() again. I'm thinking that this might possibly lead to strange problems. Typically, your system will send a message and then wait for a response over and over again. While you are formulating a message (or handling an incoming message) -- it would not be a good idea to go looking for another incoming again in MQTT_ProcessLoop() until your processing is complete. If you have no processing going on -- then MQTT_ProcessLoop() processing is fine because your software state machine is idle.

The same might apply if are handling any kind of PING response -- and the MQTT_ProcessLoop() is entered again while you're handling that response in the background. I would suggest you somehow correctly block your MQTT_ProcessLoop() while you are in a busy state.

law-ko commented 1 year ago

Hi @dhavalgujar ,

The library example you provided seems to be a rewrite based on FreeRTOS. However, we are previously following AWS provided example. It seems to me the AWS example are still for demo purposes. Is it possible to only change the MQTTAgent_CommandLoop instead of the whole project code and structure?

The issue we have with updating the entire project is the need of fleet provisioning by claim, which iot-reference-esp32c3 is burning the already provisioned certificate into the esp_secure_cert partition.

SolidStateLEDLighting commented 1 year ago

The way I understand it -- the Agent mechanism was added to the demo projects as a way of over-coming the weakness in doing all the MQTT calls in different tasks. If you consolidate all your MQTT actions in one task, the need for the agent goes away and the project get simpler. Learn all you can from the demos and follow how and why they subscribe, publish, handle incoming responses, and unsubscribe. Then use that instruction to make one translation unit (single task) that logs in as a Client, Fleet Provisions, logs out, logs back in as a Thing, connects to Shadow, and does OTA as needed based on software state. Along the way you'll be doing other things with your incoming responses -- like storing the certs/keys in NVS. Another tip that I will offer, is that I had to componentize my project before I could easily consume the newest esp-aws-iot library. That will take some effort is understanding just enough about CMake to structure your project better.

GuGu927 commented 1 year ago

@SolidStateLEDLighting Hi. Can I ask you about MQTT Agent? Is it right that using MQTT with one task is better than using MQTT Agent in ESP32?

SolidStateLEDLighting commented 1 year ago

I rolled all the sample projects into one. I have one task that runs it all.

So, I have no idea why someone would want to make the project more complex with an agent task when this is not necessary (except for tying together disassociated sample project?)

I'm not even sure what they are exactly talking about without looking at a software design document that completely explains it. Does anyone see any design documents that explains their demos?

To me, better means simpler (as long as the functionality and performance is adequate for the job).


From: GuGu927 @.> Sent: Tuesday, April 4, 2023 11:40 AM To: espressif/esp-aws-iot @.> Cc: keith ssledlighting.com @.>; Mention @.> Subject: Re: [espressif/esp-aws-iot] esp-tls-mbedtls: read error :-0x004C: (CA-265) (Issue #160)

@SolidStateLEDLightinghttps://github.com/SolidStateLEDLighting Hi. Can I ask you about MQTT Agent? Is it right that using MQTT with one task is better than using MQTT Agent in ESP32?

— Reply to this email directly, view it on GitHubhttps://github.com/espressif/esp-aws-iot/issues/160#issuecomment-1495300211, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGGOKEZFCJFBWCHVRTWZ6KTW7OJ3LANCNFSM6AAAAAATGEXJNU. You are receiving this because you were mentioned.Message ID: @.***>

GuGu927 commented 1 year ago

@SolidStateLEDLighting Thx a lot :)