Closed yunhaoling closed 3 years ago
macos would trigger connection ERROR state instead of connection CLOSE on windows/linux (this is due to the different tlsio implementation in c)after socket gets completely lost while our current implementation doesn't handle connection ERROR state properly, so adding one line code change into connect.py to handle ERROR state.
The PR is to address EventHub issue regarding to sending large messages triggering segmentation fault after socket is complete lost: https://github.com/Azure/azure-sdk-for-python/issues/14543, https://github.com/Azure/azure-sdk-for-python/issues/13739
ROOT CAUSE:
ASYNC_OPERATION_HANDLE result
gets freed twice leading to segmentation fault in link_transfer_async in link.c when sending large message composed of multiple transfer frames in socket lost caseOPENED
toEND
due to socket completely lost ("Failure: sending socket failed 10054.")connection_set_state
->session->on_connection_state_changed
->link->on_session_state_changed
link_transfer_async
triggers the segmentation fault whensession_send_transfer
results inSESSION_SEND_TRANSFER_ERROR
.HOW TO FIX:
---------------------------------------- Advanced Topic ----------------------------------------
WHY SMALL MESSAGE COULD SURVIVE let's understand the send logic first:
frame_codec_encode_frame
inframe_codec.c
. there is no return value fromon_bytes_encoded
.link_transfer_async
is called and succeeded, the execution of the program is given back to the python layer -- to connection.do_work(), and the socket error will be detected and raised.---------------------------------------- Code Snippets to Reproduce ----------------------------------------