Azure / azure-uamqp-python

AMQP 1.0 client library for Python
MIT License
57 stars 48 forks source link

Fix segmentation fault when sending large messages after socket is completely lost #217

Closed yunhaoling closed 3 years ago

yunhaoling commented 3 years ago

The PR is to address EventHub issue regarding to sending large messages triggering segmentation fault after socket is complete lost: https://github.com/Azure/azure-sdk-for-python/issues/14543, https://github.com/Azure/azure-sdk-for-python/issues/13739

ROOT CAUSE:

HOW TO FIX:

---------------------------------------- Advanced Topic ----------------------------------------

WHY SMALL MESSAGE COULD SURVIVE let's understand the send logic first:

    # in SendClient._client_run, small message won't trigger seg fault
    def _client_run(self):
        # code...
        self._pending_messages = self._filter_pending() # this method will call C module to send messages, small message could pass
        # code...
        self._connection.work()  # connection error raised

---------------------------------------- Code Snippets to Reproduce ----------------------------------------

import uamqp
from uamqp import authentication
from datetime import datetime
import time
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s')

live_eventhub_config = {...}

uri = "sb://{}/{}".format(live_eventhub_config['hostname'], live_eventhub_config['event_hub'])
sas_auth = authentication.SASTokenAuth.from_shared_access_key(
    uri, live_eventhub_config['key_name'], live_eventhub_config['access_key'])

target = "amqps://{}/{}".format(live_eventhub_config['hostname'], live_eventhub_config['event_hub'])

send_client = uamqp.SendClient(target, auth=sas_auth, debug=True)
send_client.open()
while not send_client.client_ready():
    send_client.do_work()
print(datetime.now(), "send client is opened")

print(datetime.now(), 'start sleep')
time.sleep(350)
# sleep until the underlying socket io is completely lost
# On windows, the socket io reports "Failure: sending socket failed 10054."
# On linux, the socket io reports "sending socket failed. errno=104 (Connection reset by peer)."
print(datetime.now(), 'end sleep')

# big message will be split into multiple amqp frames which goes into an execution path
# different than a small message (which is composed of just one frame)
# see code: https://github.com/Azure/azure-uamqp-c/blob/master/src/session.c#L1532-L1676
message = uamqp.Message(
    b't'*1024*700
)

# seg fault happens
send_client.send_message(message)

send_client.close()
print(datetime.now(), "send client is closed")
yunhaoling commented 3 years ago

macos would trigger connection ERROR state instead of connection CLOSE on windows/linux (this is due to the different tlsio implementation in c)after socket gets completely lost while our current implementation doesn't handle connection ERROR state properly, so adding one line code change into connect.py to handle ERROR state.