aws / aws-iot-device-sdk-python

SDK for connecting to AWS IoT from a device using Python.
Apache License 2.0
683 stars 426 forks source link

Received data left pending in the SSL library #296

Closed shiinahub closed 2 years ago

shiinahub commented 3 years ago

Hi. I have encountered an issue of missing reply from the IoT endpoint. Sometimes I cannot observe any response from the IoT endpoint when I send a request over MQTT. A typical scenario of this issue is the AWS IoT Jobs related message exchange. Sometimes the “accepted” or “rejected” callback is not invoked after a “start-next” request is sent.

I figured out that the MQTT client implementation (AWSIoTPythonSDK/core/protocol/paho/client.py) based on the Paho library has a pending buffer issue which has already been solved in the Paho library. The relevant issue is found here.

I reproduced this issue by writing a short script that simply subscribes to the topic $aws/things/THING_NAME/jobs/start-next/accepted (and also the /rejected topic) and sends a request to the topic $aws/things/THING_NAME/jobs/start-next periodically with a reasonably long interval. Sometimes the callback is not invoked until the next start-next is sent. The test environment is as follows:

I backported the fix from the Paho library by inserting the relevant lines below and I confirmed that this modification resolved the issue and I now successfully receive the “accepted” or “rejected” message timely after every request transmission.

--- a/AWSIoTPythonSDK/core/protocol/paho/client.py
+++ b/AWSIoTPythonSDK/core/protocol/paho/client.py
@@ -877,6 +877,16 @@ class Client(object):
         self._out_packet_mutex.release()
         self._current_out_packet_mutex.release()

+        # used to check if there are any bytes left in the ssl socket
+        pending_bytes = 0
+
+        if self._ssl:
+            pending_bytes = self._ssl.pending()
+
+        # if bytes are pending do not wait in select
+        if pending_bytes > 0:
+            timeout = 0.0
+
         # sockpairR is used to break out of select() before the timeout, on a
         # call to publish() etc.
         rlist = [self.socket(), self._sockpairR]
@@ -892,7 +902,7 @@ class Client(object):
         except:
             return MQTT_ERR_UNKNOWN

-        if self.socket() in socklist[0]:
+        if self.socket() in socklist[0] or pending_bytes > 0:
             rc = self.loop_read(max_packets)
             if rc or (self._ssl is None and self._sock is None):
                 return rc

Is this issue a bug indeed and is my description above valid? Or am I missing something?

Best Regards, Masaki Shiina.

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.