eclipse / paho.mqtt-sn.embedded-c

Paho C MQTT-SN gateway and libraries for embedded systems. Paho is an Eclipse IoT project.
https://eclipse.org/paho
Other
314 stars 179 forks source link

Memory leak with ssl and client cleanup #220

Closed Oakrie closed 3 years ago

Oakrie commented 3 years ago

Hi there, I've been using the MQTT-SN-GW with the Amazon MQTT Broker. I've noticed when the keep-alive time is exceeded that the SSL connection on the GW side may not be cleaned up properly. When the AWS client closes the SSL connection, the network receives an SSL_ERROR_ZERO_RETURN.

int Network::recv(uint8_t* buf, uint16_t len)
{
...
        switch (SSL_get_error(_ssl, rlen))
        {
...
        case SSL_ERROR_ZERO_RETURN:
            SSL_shutdown(_ssl);
            _ssl = 0;
            _numOfInstance--;
            //TCPStack::close();
            _busy = false;
            _mutex.unlock();
            return -1;
            break;
...
               }
...
}

After the return -1, the client socket will be rechecked, receiving a return code of 0 (since _ssl is now 0) in

void BrokerRecvTask::run(void)
{
...
    if ( rc == 0 )  // Disconnected
    {
        client->getNetwork()->close();
        delete packet;
        /* delete client when the client is not authorized & session is clean */
        _gateway->getClientList()->erase(client);
        if ( client )
        {
            client = client->getNextClient();
        }
        continue;
    }
...

When client->getNetwork()->close(); is called, it will not fully run since the case SSL_ERROR_ZERO_RETURN cleaned up only part of the ssl connection. Also line _gateway->getClientList()->erase(client); does not erase the cliet from the list either since in the erase function the _authorize with always be true.

So what I did to try and fix the leaks were the following:

in src/Linux/Network.cpp

int Network::recv(uint8_t* buf, uint16_t len)
{
...
        switch (SSL_get_error(_ssl, rlen))
        {
...
        case SSL_ERROR_ZERO_RETURN:
-           SSL_shutdown(_ssl);
-           _ssl = 0;
-           _numOfInstance--;
-           //TCPStack::close();
-           _busy = false;
            _mutex.unlock();
+           Network::close();
            return -1;
            break;
...
        }
...
}

in src/MQTTSNGWClientList.cpp

void ClientList::erase(Client*& client)
{
-     if ( !_authorize && client->erasable())
+    if ( _authorize && client->erasable())
    {
...

After using Valgrind to check for any more possible leaks, I found that one more.

in src/linux/Network.cpp

bool Network::connect(const char* host, const char* port, const char* caPath, const char* caFile, const char* certkey, const char* prvkey)
{
    ...
        rc = true;
+       X509_free(peer);
    }
    catch (bool x)
    {
        rc = x;
    }
    _mutex.unlock();
    return rc;
}
}

However, now that the leak is fixed, I've been getting issues. now with _mutex.lock(). It will sometimes throw the following exception. throw Exception( -1, "The same thread can't acquire a mutex twice.");

Clearly, the way I am going about trying to fix the memory leak is incorrect since this is happening. Do you have any recommendations or implementations that can fix this?

Thanks for your time, Tim

ty4tw commented 3 years ago

Cause of the CI error is a broker host address in gateway.conf. This issue was fixed.