aws / aws-iot-device-sdk-cpp

SDK for connecting to AWS IoT from a device using C++
http://aws-iot-device-sdk-cpp-docs.s3-website-us-east-1.amazonaws.com
Apache License 2.0
122 stars 111 forks source link

The software crashes after running for 2 days #210

Closed xunshichentuo closed 1 year ago

xunshichentuo commented 1 year ago

Describe the bug

We connect through the device serial number, AWS IoT. When the device serial number is duplicated, the IoT connection will continue to be disconnected and reconnected, sometimes lasting for about 2 days and crashing.

Expected Behavior

It shouldn't have collapsed

Current Behavior

Although he may not collapse 100%, there is a probability. After a long period of operation

Reproduction Steps

Use the subAndPub example in the sample and write it in qt. Then run and use it, as the serial number is the same, it will continuously disconnect and reconnect

Possible Solution

No response

Additional Information/Context

This is gdb's crash stack record:

#0  0x6c5d70cc in sock_eq (a1=0x6cdf2704, a2=0x0) at res_send.c:1432
#1  0x6c5d7be8 in __res_context_send (ctx=ctx@entry=0x6f8a4c60, buf=0x6cdefe38 "R\263\001",
    buf@entry=0x76fea968 <__stack_chk_guard> "", buflen=64, buf2=0x0, buf2@entry=0x6cdf08d8 "\216\v\201\200",
    buflen2=buflen2@entry=0, ans=<optimized out>, ans@entry=0x6cdf08d8 "\216\v\201\200",
    anssiz=<optimized out>, anssiz@entry=1024, ansp=ansp@entry=0x6cdf0d0c, ansp2=0x0, ansp2@entry=0x6cdf0d0c,
    nansp2=0x0, nansp2@entry=0x6c5d660f <__res_context_querydomain+154>, resplen2=0x0, resplen2@entry=0x400,
    ansp2_malloced=ansp2_malloced@entry=0x0) at res_send.c:451
#2  0x6c5d6204 in __GI___res_context_query (ctx=ctx@entry=0x6f8a4c60, name=0x0, class=0, class@entry=1,
    type=type@entry=1, answer=<optimized out>, answer@entry=0x6cdf08d8 "\216\v\201\200",
    anslen=<optimized out>, anslen@entry=1024, answerp=<optimized out>, answerp@entry=0x6cdf0d0c,
    answerp2=<optimized out>, answerp2@entry=0x0, nanswerp2=<optimized out>, nanswerp2@entry=0x0,
    resplen2=<optimized out>, resplen2@entry=0x0, answerp2_malloced=<optimized out>,
    answerp2_malloced@entry=0x0) at res_query.c:216
#3  0x6c5d660e in __res_context_querydomain (ctx=ctx@entry=0x6f8a4c60,
    name=name@entry=0x164d2c0 "a2oupm7fvkvi65-ats.iot.us-west-2.amazonaws.com", domain=domain@entry=0x0,
    class=class@entry=1, type=type@entry=1, answer=answer@entry=0x6cdf08d8 "\216\v\201\200",
    anslen=anslen@entry=1024, answerp=answerp@entry=0x6cdf0d0c, answerp2=answerp2@entry=0x0,
    nanswerp2=nanswerp2@entry=0x0, resplen2=resplen2@entry=0x0, answerp2_malloced=answerp2_malloced@entry=0x0)
    at res_query.c:601
#4  0x6c5d6a36 in __GI___res_context_search (ctx=0x6f8a4c60,
    ctx@entry=0x672dd1 <tls_post_encryption_processing_default+248>,
    name=name@entry=0x164d2c0 "a2oupm7fvkvi65-ats.iot.us-west-2.amazonaws.com", class=class@entry=1, type=1,
    type@entry=1996401000, answer=0x6cdf08d8 "\216\v\201\200", anslen=1024, answerp=0x6cdf0d0c, answerp2=0x0,
    nanswerp2=0x0, resplen2=0x0, answerp2_malloced=0x0) at res_query.c:370
#5  0x6ee4249e in gethostbyname3_context (ctx=0x672dd1 <tls_post_encryption_processing_default+248>,
    ctx@entry=0x6f8a4c60, name=0x164d2c0 "a2oupm7fvkvi65-ats.iot.us-west-2.amazonaws.com", af=af@entry=2,
    result=result@entry=0x6cdf12f0, buffer=buffer@entry=0x6cdf1548 "", buflen=buflen@entry=1024,
    errnop=errnop@entry=0x6cdf2934, h_errnop=h_errnop@entry=0x6cdf2970, ttlp=ttlp@entry=0x0,
    canonp=canonp@entry=0x0) at nss_dns/dns-host.c:218
#6  0x6ee42a22 in _nss_dns_gethostbyname3_r (name=<optimized out>, af=2, result=0x6cdf12f0,
    buffer=0x6cdf1548 "", buflen=buflen@entry=1024, errnop=errnop@entry=0x6cdf2934,
    h_errnop=h_errnop@entry=0x6cdf2970, ttlp=ttlp@entry=0x0, canonp=canonp@entry=0x0) at nss_dns/dns-host.c:164
#7  0x6ee42a6e in _nss_dns_gethostbyname2_r (name=<optimized out>, af=<optimized out>, result=<optimized out>,
    buffer=<optimized out>, buflen=1024, errnop=0x6cdf2934, h_errnop=0x6cdf2970) at nss_dns/dns-host.c:282
#8  0x75eb8ec0 in __gethostbyname2_r (name=0x164d2c0 "a2oupm7fvkvi65-ats.iot.us-west-2.amazonaws.com", af=2,
    resbuf=0x6cdf12f0, buffer=<optimized out>, buflen=1024, result=0x6cdf12ec, h_errnop=0x6cdf2970)
---Type <return> to continue, or q <return> to quit---
    at ../nss/getXXbyYY_r.c:382
#9  0x75e9b99e in gaih_inet (name=0x0, service=<optimized out>, req=<optimized out>, pai=0x0,
    naddrs=0x6cdf14a4, tmpbuf=0x6cdf1540) at ../sysdeps/posix/getaddrinfo.c:920
#10 0x75e9c1ea in __GI_getaddrinfo (name=<optimized out>, service=0x0, hints=<optimized out>, pai=0x13afc64)
    at ../sysdeps/posix/getaddrinfo.c:2364
#11 0x0056f5e4 in awsiotsdk::network::OpenSSLConnection::ConnectTCPSocket() ()
#12 0x0056fc46 in awsiotsdk::network::OpenSSLConnection::PerformSSLConnect() ()
#13 0x0056fe10 in awsiotsdk::network::OpenSSLConnection::ConnectInternal() ()
#14 0x005e47e0 in awsiotsdk::NetworkConnection::Connect() ()
#15 0x0061b1d8 in awsiotsdk::mqtt::ConnectActionAsync::PerformAction(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>) ()
#16 0x00644650 in awsiotsdk::ClientCoreState::PerformAction(awsiotsdk::ActionType, std::shared_ptr<awsiotsdk::ActionData>, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >) ()
#17 0x0061be3a in awsiotsdk::mqtt::KeepaliveActionRunner::PerformAction(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>) ()
#18 0x00643818 in awsiotsdk::ResponseCode std::__invoke_impl<awsiotsdk::ResponseCode, awsiotsdk::ResponseCode (awsiotsdk::Action::*&)(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>), std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >&, std::shared_ptr<awsiotsdk::NetworkConnection>&, std::shared_ptr<awsiotsdk::ActionData>&>(std::__invoke_memfun_deref, awsiotsdk::ResponseCode (awsiotsdk::Action::*&)(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>), std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >&, std::shared_ptr<awsiotsdk::NetworkConnection>&, std::shared_ptr<awsiotsdk::ActionData>&) ()
#19 0x00643472 in std::__invoke_result<awsiotsdk::ResponseCode (awsiotsdk::Action::*&)(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>), std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >&, std::shared_ptr<awsiotsdk::NetworkConnection>&, std::shared_ptr<awsiotsdk::ActionData>&>::type std::__invoke<awsiotsdk::ResponseCode (awsiotsdk::Action::*&)(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>), std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >&, std::shared_ptr<awsiotsdk::NetworkConnection>&, std::shared_ptr<awsiotsdk::ActionData>&>(awsiotsdk::ResponseCode (awsiotsdk::Action::*&)(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>), std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >&, std::shared_ptr<awsiotsdk::NetworkConnection>&, std::shared_ptr<awsiotsdk::ActionData>&) ()
#20 0x00642fb4 in awsiotsdk::ResponseCode std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>::__call<awsiotsdk::ResponseCode, , 0u, 1u, 2u>(std::tuple<>&&, std::_Index_tuple<0u, 1u, 2u>)
    ()
---Type <return> to continue, or q <return> to quit---
#21 0x00642b0e in awsiotsdk::ResponseCode std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>::operator()<, awsiotsdk::ResponseCode>() ()
#22 0x006422e4 in awsiotsdk::ResponseCode std::__invoke_impl<awsiotsdk::ResponseCode, std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>>(std::__invoke_other, std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>&&) ()
#23 0x00641770 in std::__invoke_result<std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>>::type std::__invoke<std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>>(std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)>&&) ()
#24 0x00643b86 in decltype (__invoke((_S_declval<0u>)())) std::thread::_Invoker<std::tuple<std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)> > >::_M_invoke<0u>(std::_Index_tuple<0u>) ()
#25 0x00643b04 in std::thread::_Invoker<std::tuple<std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)> > >::operator()() ()
#26 0x00643ac4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<awsiotsdk::ResponseCode (awsiotsdk::Action::*(std::unique_ptr<awsiotsdk::Action, std::default_delete<awsiotsdk::Action> >, std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>))(std::shared_ptr<awsiotsdk::NetworkConnection>, std::shared_ptr<awsiotsdk::ActionData>)> > > >::_M_run() ()
#27 0x76044f52 in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#28 0x760ce614 in start_thread (arg=0xd5a1c8df) at pthread_create.c:463
#29 0x75eac3cc in __GI_epoll_pwait (epfd=1826563136, events=0x0, maxevents=120, timeout=1889524186, set=0x0)
    at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
---Type <return> to continue, or q <return> to quit---
#30 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

SDK version used

latest

Environment details (OS name and version, etc.)

ubuntu1804 imx6ull

jmklix commented 1 year ago

Please consider using the v2 version of this sdk: https://github.com/aws/aws-iot-device-sdk-cpp-v2

xunshichentuo commented 1 year ago

Thank you very much for your answer.

cat iot_config_prod.json
{
  "endpoint": "xxxxxxxxxx.iot.us-west-2.amazonaws.com",
  "mqtt_port": xxxx,
  "https_port": xxxx,
  "greengrass_discovery_port": xxxx,
  "root_ca_relative_path": "certs_iot_prod/rootCA.crt",
  "device_certificate_relative_path": "certs_iot_prod/cert.pem",
  "device_private_key_relative_path": "certs_iot_prod/privkey.pem",
  "tls_handshake_timeout_msecs": 60000,
  "tls_read_timeout_msecs": 2000,
  "tls_write_timeout_msecs": 2000,
  "aws_region": "us-west-2",
  "aws_access_key_id": "",
  "aws_secret_access_key": "",
  "aws_session_token": "",
  "client_id": "pro-PC2A0021500000-prod",
  "thing_name": "pro-PC2A0021500000-prod",
  "is_clean_session": true,
  "mqtt_command_timeout_msecs": 20000,
  "keepalive_interval_secs": 10,
  "minimum_reconnect_interval_secs": 1,
  "maximum_reconnect_interval_secs": 128,
  "maximum_acks_to_wait_for": 32,
  "action_processing_rate_hz": 5,
  "maximum_outgoing_action_queue_length": 32,
  "discover_action_timeout_msecs": 300000,
  "version": "2020-06-16"
}

This is my mqttclient.h

#ifndef MQTTCLIENT_H
#define MQTTCLIENT_H

#include <QObject>
#include <QTimer>
#include <QtMath>
#include "configcommon.h"
#include "network/OpenSSL/OpenSSLConnection.hpp"
#include "include/ClientCoreState.hpp"
#include "include/mqtt/Client.hpp"

class MqttClient : public QObject
{
    Q_OBJECT
public:
    explicit MqttClient(QObject *parent = nullptr);
    void init();

    awsiotsdk::ResponseCode publish(awsiotsdk::util::String topicNameStr, awsiotsdk::util::String message);
    awsiotsdk::ResponseCode subscribe(awsiotsdk::util::String topicNameStr);
    awsiotsdk::ResponseCode unsubscribeTo(awsiotsdk::util::String topicNameStr);
protected:
    void initializeMqttClient();
    void initializeConnection();
    std::shared_ptr<awsiotsdk::NetworkConnection> configNetworkConnectionInOpenSSL();
    int getReconnectionIntervalMillis(int retryCount);
    awsiotsdk::ResponseCode onDisconnected();
    awsiotsdk::ResponseCode onReconnected();
    awsiotsdk::ResponseCode onResubscribe();
    awsiotsdk::ResponseCode onSubMessage(awsiotsdk::util::String topic_name,
                                         awsiotsdk::util::String payload,
                                         std::shared_ptr<awsiotsdk::mqtt::SubscriptionHandlerContextData> p_app_handler_data);

signals:
    void reconnecting();
    void updateServerStatusLater();
    void mqttMessageReceived(const QString topicName, const QString message);
private:
    ConfigCommon configCommon;
    std::shared_ptr<awsiotsdk::NetworkConnection> networkConnection;
    std::shared_ptr<awsiotsdk::MqttClient> iotClient;
    int reconnectionCount = 0;
    const int MAX_INIT_CONNECT_RETRY_INTERVAL_MILLIS = 60000;
    const int MIN_INIT_CONNECT_RETRY_INTERVAL_MILLIS = 3000;
};

#endif // MQTTCLIENT_H

This is my mqttclient.cpp

#include "mqttclient.h"
#include "Controller/EnvironmentController/environmentcontroller.h"

MqttClient::MqttClient(QObject *parent) : QObject(parent)
{

}

void MqttClient::init()
{
    initializeMqttClient();
    initializeConnection();
}

awsiotsdk::ResponseCode MqttClient::publish(awsiotsdk::util::String topicNameStr, awsiotsdk::util::String message)
{
    if(iotClient.get() == nullptr) {
        return awsiotsdk::ResponseCode::FAILURE;
    }

    std::unique_ptr<awsiotsdk::Utf8String> topicName = awsiotsdk::Utf8String::Create(topicNameStr);
    uint16_t packetId = 0;
    qDebug() << "MqttClient::publish: " << QString::fromStdString(topicNameStr) << " - " << QString::fromStdString(message);
    awsiotsdk::ResponseCode responseCode = iotClient->PublishAsync(std::move(topicName),
                                                                   false, false, awsiotsdk::mqtt::QoS::QOS0,
                                                                   message, nullptr, packetId);

    return responseCode;
}

awsiotsdk::ResponseCode MqttClient::subscribe(awsiotsdk::util::String topicNameStr)
{
    qDebug()<<"MqttClient::subscribe topicName:"<<QString::fromStdString(topicNameStr);
    std::unique_ptr<awsiotsdk::Utf8String> topicName = awsiotsdk::Utf8String::Create(topicNameStr);
    awsiotsdk::mqtt::Subscription::ApplicationCallbackHandlerPtr subCallbackHandler = std::bind(&MqttClient::onSubMessage,
                                                                                                this,
                                                                                                std::placeholders::_1,
                                                                                                std::placeholders::_2,
                                                                                                std::placeholders::_3);
    std::shared_ptr<awsiotsdk::mqtt::Subscription> subscription =
            awsiotsdk::mqtt::Subscription::Create(std::move(topicName), awsiotsdk::mqtt::QoS::QOS1, subCallbackHandler, nullptr);

    awsiotsdk::util::Vector<std::shared_ptr<awsiotsdk::mqtt::Subscription>> subscriptionList;
    subscriptionList.push_back(subscription);

    awsiotsdk::ResponseCode responseCode = iotClient->Subscribe(subscriptionList, configCommon.mqtt_command_timeout_);

    if (responseCode != awsiotsdk::ResponseCode::SUCCESS) {
        qDebug() << "Subscribe failed.";
    }

    /*
     * Sleep to wait for the subscribe to finish
     * It may not be the best way to handle async request, but it's recommended by AWSIoTSDK
     * This function should be run on a separate worker thread instead of the main thread
     */
    std::this_thread::sleep_for(std::chrono::seconds(3));

    return responseCode;
}

awsiotsdk::ResponseCode MqttClient::unsubscribeTo(awsiotsdk::util::String topicNameStr)
{
    std::unique_ptr<awsiotsdk::Utf8String> topicName = awsiotsdk::Utf8String::Create(topicNameStr);
    awsiotsdk::util::Vector<std::unique_ptr<awsiotsdk::Utf8String>> unsubscribeList;
    unsubscribeList.push_back(std::move(topicName));

    awsiotsdk::ResponseCode responseCode = iotClient->Unsubscribe(std::move(unsubscribeList), configCommon.mqtt_command_timeout_);
    std::this_thread::sleep_for(std::chrono::seconds(1));
    return responseCode;
}

void MqttClient::initializeMqttClient()
{
    awsiotsdk::ClientCoreState::ApplicationDisconnectCallbackPtr disconnectHandler = std::bind(&MqttClient::onDisconnected, this);
    awsiotsdk::ClientCoreState::ApplicationReconnectCallbackPtr reconnectHandler = std::bind(&MqttClient::onReconnected, this);
    awsiotsdk::ClientCoreState::ApplicationResubscribeCallbackPtr resubscribeHandler = std::bind(&MqttClient::onResubscribe, this);
    QString configFileName = EnvironmentController::getSingleton()->getIotConfigFileName();
    awsiotsdk::ResponseCode responseCode = configCommon.initializeCommon(configFileName);
    if (responseCode != awsiotsdk::ResponseCode::SUCCESS) {
        qDebug() << "MqttClient::initializeMqttClient - Failed to read " << configFileName << " ; ResponseCode=" << int(responseCode);
    }

    networkConnection = configNetworkConnectionInOpenSSL();

    iotClient = std::shared_ptr<awsiotsdk::MqttClient>(
                awsiotsdk::MqttClient::Create(networkConnection,
                                              configCommon.mqtt_command_timeout_,
                                              disconnectHandler, nullptr,
                                              reconnectHandler, nullptr,
                                              resubscribeHandler, nullptr));
}

void MqttClient::initializeConnection()
{
    awsiotsdk::util::String clientIdStr = configCommon.base_client_id_;
    std::unique_ptr<awsiotsdk::Utf8String> clientId = awsiotsdk::Utf8String::Create(clientIdStr);
    awsiotsdk::ResponseCode responseCode = iotClient->Connect(configCommon.mqtt_command_timeout_,
                       configCommon.is_clean_session_,
                       awsiotsdk::mqtt::Version::MQTT_3_1_1,
                       configCommon.keep_alive_timeout_secs_,
                       std::move(clientId), nullptr, nullptr, nullptr);
    if(responseCode != awsiotsdk::ResponseCode::MQTT_CONNACK_CONNECTION_ACCEPTED) {
        qDebug()<<"MqttClient::initializeConnection != awsiotsdk::ResponseCode::MQTT_CONNACK_CONNECTION_ACCEPTED:"<<int(responseCode);
        QTimer::singleShot(getReconnectionIntervalMillis(reconnectionCount), this, [&](){ initializeConnection(); });
        reconnectionCount++;
    } else {
        qDebug()<<"MqttClient::initializeIotClient iotClient->Connect awsiotsdk::ResponseCode::MQTT_CONNACK_CONNECTION_ACCEPTED";
        reconnectionCount = 0;
    }
}

int MqttClient::getReconnectionIntervalMillis(int retryCount)
{
    if(retryCount > 30) {
        return MAX_INIT_CONNECT_RETRY_INTERVAL_MILLIS;
    }

    int interval  = static_cast<int>(qPow(2, retryCount) *1000 + MIN_INIT_CONNECT_RETRY_INTERVAL_MILLIS);
    interval = qMin(MAX_INIT_CONNECT_RETRY_INTERVAL_MILLIS, interval);
    qDebug()<<"MqttClient::getReconnectionIntervalMillis retry in "<<interval<<"ms";
    return interval;
}

awsiotsdk::ResponseCode MqttClient::onDisconnected()
{
    qDebug()<<"MqttClient::onDisconnected - Ops, IoT Disconnected...";
    return awsiotsdk::ResponseCode::FAILURE;
}

awsiotsdk::ResponseCode MqttClient::onReconnected()
{
    qDebug()<<"MqttClient::onDisconnected - IoT Reconnected...";
    emit reconnecting();
    return awsiotsdk::ResponseCode::SUCCESS;
}

awsiotsdk::ResponseCode MqttClient::onResubscribe()
{
    qDebug()<<"MqttClient::onResubscribe - MQTT Topics Subscribed Again!";
    emit updateServerStatusLater();
    return awsiotsdk::ResponseCode::SUCCESS;
}

awsiotsdk::ResponseCode MqttClient::onSubMessage(awsiotsdk::util::String topic_name, awsiotsdk::util::String payload, std::shared_ptr<awsiotsdk::mqtt::SubscriptionHandlerContextData> p_app_handler_data)
{
    Q_UNUSED(p_app_handler_data)
    qDebug() << "MqttClient::onSubMessage --- NEW SUB MESSAGE ---";
    qDebug() << "MqttClient::onSubMessage - Received message";
    qDebug() << "MqttClient::onSubMessage - Topic: " << QString::fromStdString(topic_name);
    qDebug() << "MqttClient::onSubMessage - Message Length: " << payload.length();
    qDebug() << "MqttClient::onSubMessage --- END OF NEW MESSAGE ---";

    emit mqttMessageReceived(QString::fromStdString(topic_name), QString::fromStdString(payload));
    return awsiotsdk::ResponseCode::SUCCESS;
}

std::shared_ptr<awsiotsdk::NetworkConnection> MqttClient::configNetworkConnectionInOpenSSL()
{
    std::shared_ptr<awsiotsdk::network::OpenSSLConnection> networkConnectionOpenSSL =
            std::make_shared<awsiotsdk::network::OpenSSLConnection>(configCommon.endpoint_,
                                                                    configCommon.endpoint_mqtt_port_,
                                                                    configCommon.root_ca_path_,
                                                                    configCommon.client_cert_path_,
                                                                    configCommon.client_key_path_,
                                                                    configCommon.tls_handshake_timeout_,
                                                                    configCommon.tls_read_timeout_,
                                                                    configCommon.tls_write_timeout_, true);
    awsiotsdk::ResponseCode responseCode = networkConnectionOpenSSL->Initialize();
    if (responseCode != awsiotsdk::ResponseCode::SUCCESS) {
        qDebug() << "MqttClient::configNetworkConnectionInOpenSSL() - Failed to initialize network connection in OpenSSL";
    }

    return std::dynamic_pointer_cast<awsiotsdk::NetworkConnection>(networkConnectionOpenSSL);
}
jmklix commented 1 year ago

Can you try adding a random string to your client ID? The sample does it like this:

client_id_tagged.append(std::to_string(rand()));

As you noticed using the same client ID is causing a disconnect loop. Each client is supposed to have its own unique id

xunshichentuo commented 1 year ago

Because of the unique ID, it will not be reconnected and the probability of crashing is relatively low. Is there an API that can disable AWS IOT reconnection in devices? Thanks

jmklix commented 1 year ago

You shouldn't be trying to handle you reconnecting yourself. That is part of the sdk and when you try reconnecting with same client id that can cause unexpected behavior (ie. crashes after 2 days). In the sample I previously linked you can see that connect is only ever called once vs multiple times that you are trying.

 rc = p_iot_client_->Connect(ConfigCommon::mqtt_command_timeout_, ConfigCommon::is_clean_session_,
                                        mqtt::Version::MQTT_3_1_1, ConfigCommon::keep_alive_timeout_secs_,
                                        std::move(client_id), nullptr, nullptr, nullptr);

Please remove your retry logic and see if that fixes the crashes that you are seeing

xunshichentuo commented 1 year ago

Thank you very much. I think it's more important to avoid duplicate IDs. I have been testing for a week and there is no such problem under normal circumstances

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.