espressif / esp-aws-iot

AWS IoT SDK for ESP32 based chipsets
Apache License 2.0
266 stars 157 forks source link

How to shorten the transmission time of mqtt? (CA-228) #127

Open new5631 opened 2 years ago

new5631 commented 2 years ago

Hello everyone, The problem we currently encounter is that the time difference between the application sending an instruction to the device and receiving a reply is about 5 seconds. The demo tls_mutual_auth test shows that the timeout time set for MQTT_ProcessLoop is 1500ms, but the function also takes about 3S.

define MQTT_PROCESS_LOOP_TIMEOUT_MS ( 1500U )

        starttime = Clock_GetTimeMs();
        mqttStatus = MQTT_ProcessLoop( pMqttContext, MQTT_PROCESS_LOOP_TIMEOUT_MS );
        endtime = Clock_GetTimeMs();

        printf("===========================\r\n");
        printf("time ms %d\r\n",endtime-starttime);

print time ms 3499.

It seems to have something to do with the timeout set in file esp-aws-iot\libraries\coreMQTT\port\network_transport\network_transport.c

TlsTransportStatus_t xTlsConnect( NetworkContext_t* pxNetworkContext ) { TlsTransportStatus_t xRet = TLS_TRANSPORT_SUCCESS;

esp_tls_cfg_t xEspTlsConfig = {
    .cacert_buf = (const unsigned char*) ( pxNetworkContext->pcServerRootCAPem ),
    .cacert_bytes = strlen( pxNetworkContext->pcServerRootCAPem ) + 1,
    .clientcert_buf = (const unsigned char*) ( pxNetworkContext->pcClientCertPem ),
    .clientcert_bytes = strlen( pxNetworkContext->pcClientCertPem ) + 1,
    .skip_common_name = pxNetworkContext->disableSni,
    .alpn_protos = pxNetworkContext->pAlpnProtos,

if CONFIG_CORE_MQTT_USE_SECURE_ELEMENT

    .use_secure_element = true,

elif CONFIG_CORE_MQTT_USE_DS_PERIPHERAL

    .ds_data = pxNetworkContext->ds_data,

else

    .use_secure_element = false,
    .ds_data = NULL,
    .clientkey_buf = ( const unsigned char* )( pxNetworkContext->pcClientKeyPem ),
    .clientkey_bytes = strlen( pxNetworkContext->pcClientKeyPem ) + 1,

endif

    **.timeout_ms = 3000,**
};

...... }

Help me analyze how to shorten the communication time, thank you!

Alson-tang commented 2 years ago

Hello. I have read this question and found that TLS uses the block method to read data, so I recommend customers to set the TLS method in xTlsConnect to non_block mode.

TlsTransportStatus_t xTlsConnect( NetworkContext_t* pxNetworkContext )
{
    TlsTransportStatus_t xRet = TLS_TRANSPORT_SUCCESS;

    esp_tls_cfg_t xEspTlsConfig = {
        .cacert_buf = (const unsigned char*) ( pxNetworkContext->pcServerRootCAPem ),
        .cacert_bytes = strlen( pxNetworkContext->pcServerRootCAPem ) + 1,
        .clientcert_buf = (const unsigned char*) ( pxNetworkContext->pcClientCertPem ),
        .clientcert_bytes = strlen( pxNetworkContext->pcClientCertPem ) + 1,
        .skip_common_name = pxNetworkContext->disableSni,
        .alpn_protos = pxNetworkContext->pAlpnProtos,
#if CONFIG_CORE_MQTT_USE_SECURE_ELEMENT
        .use_secure_element = true,
#elif CONFIG_CORE_MQTT_USE_DS_PERIPHERAL
        .ds_data = pxNetworkContext->ds_data,
#else
        .use_secure_element = false,
        .ds_data = NULL,
        .clientkey_buf = ( const unsigned char* )( pxNetworkContext->pcClientKeyPem ),
        .clientkey_bytes = strlen( pxNetworkContext->pcClientKeyPem ) + 1,
#endif
        .timeout_ms = 3000,
        .non_block = true,
    };

    esp_tls_t* pxTls = esp_tls_init();

    // printf("%d %s\r\n", __LINE__, __func__);
    xSemaphoreTake(pxNetworkContext->xTlsContextSemaphore, portMAX_DELAY);
    pxNetworkContext->pxTls = pxTls;

    if (esp_tls_conn_new_sync( pxNetworkContext->pcHostname, 
            strlen( pxNetworkContext->pcHostname ), 
            pxNetworkContext->xPort, 
            &xEspTlsConfig, pxTls) <= 0)
    {
        if (pxNetworkContext->pxTls)
        {
            esp_tls_conn_destroy(pxNetworkContext->pxTls);
            pxNetworkContext->pxTls = NULL;
        }
        xRet = TLS_TRANSPORT_CONNECT_FAILURE;
    } else {
        printf("tls establish success\r\n");
    }

    // printf("%d %s %d\r\n", __LINE__, __func__, esp_timer_get_time() / 1000);
    xSemaphoreGive(pxNetworkContext->xTlsContextSemaphore);

    return xRet;
}

But I found a new problem. After using the non_block mode, the failure rate of establishing MQTT connection will increase, and the system will output the following log

esp-tls: Failed to open new connection in specified timeout
avsheth commented 2 years ago

Hi @new5631 could you please help with some more info?

  1. Do you see this behaviour all the time?
  2. Are you using tls_mutual_auth as is ? That example is just for the reference purposes. It uses single thread for publishing data and receiving data. And the thread goes to sleep for 1 second here. After certain loop times, device disconnects and reconnects to the server. During both these times, reception could be delayed. Can you try disabling publish call here and disable loop delay here and see if it reflects instantly on the device?
EtienneMdv commented 2 years ago

Hi,

I am facing the same issue as described by @new5631. I recently migrated from Amazon-Freertos to esp-aws-iot. The transmission time in AFR (which uses Secure Sockets) was perfectly fine. However, the transport implementation in this repo uses blocking socket which induces important delays when publishing MQTT messages, ~3s. It is said in coreMQTT docs that the socket should be non-blocking when requesting 1 byte of data (in order to check if data is available). I tried to implement non-blocking sockets but, as @Alson-tang mentioned in a previous post, many failures occur.

Is there any solution out there based on ESP-TLS that could solve this issue? Is there any way to check for available data without having to block the socket?

Thank you!

MarkoRimacByteLab commented 2 years ago

Hello!

Same issue is happening to me aswell with the xEspTlsConfig configuration provided in your library here . All MQTT messages I am sending are delayed for 3sec + minor delay of 20-30ms.

My test scenarios and solution I found working is provided below:

SolidStateLEDLighting commented 1 year ago

The answer is to re-write the entire module to call MQTT_ProcessLoop( pMqttContext, 0) endlessly. In this case, you wait the least for your sending AND receiving. (1500 x 2 = 3 seconds for each round trip per the provided examples).

Using a wait time of 0 says, just check to see if I have something and get back to other work.

Rewrite the whole thing and include all services inside -- Client login, Fleet Provisioning, Shadow, Jobs, OTA -- and comment liberally and you'll end up with about 6500 lines of code that will be all yours.