awslabs / amazon-kinesis-video-streams-webrtc-sdk-c

Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/group__PublicMemberFunctions.html
Apache License 2.0
1.03k stars 307 forks source link

[BUG] Deadlock in master sample #1100

Closed Nomidia closed 3 years ago

Nomidia commented 3 years ago

Describe the bug

  1. pregenerateCertTimerCallback is called every second. The executorLock(A) and sampleConfigurationObjLock(B) will be used.

https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/f5b89d4b6e6116a06f0bd2580967142e98ce6f70/samples/Common.c#L907

  1. When a sdp offer is received, sampleConfigurationObjLock(B1) will also be used. Then when timerQueueAddTimer is called, the executorLock(A1) will be used.

https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/f5b89d4b6e6116a06f0bd2580967142e98ce6f70/samples/Common.c#L217-L219

File: TimerQueue.c
Function: timerQueueAddTimer

    MUTEX_LOCK(pTimerQueue->executorLock);
  1. Between A and B, if B1 is called, deadlock happends. A->B1->B(blocked)->A1(blocked)
Nomidia commented 3 years ago

11:37:57 signalingMessageReceived[1208], [B1] try lock 11:37:57 signalingMessageReceived[1211], [B1] locked 11:37:57 2021-03-16 03:37:52 INFO (thread-0x73bb64d0) signalingClientGetIceConfigInfoCount(): Signaling Client Get ICE Config Info Count 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) refreshIceConfiguration(): Refreshing the ICE Server Configuration 11:37:57 2021-03-16 03:37:52 INFO (thread-0x73bb64d0) signalingClientGetIceConfigInfo(): Signaling Client Get ICE Config Info 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) refreshIceConfiguration(): Refreshing the ICE Server Configuration 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) iceAgentValidateKvsRtcConfig(): 11:37:57 iceLocalCandidateGatheringTimeout: 10000 ms 11:37:57 iceConnectionCheckTimeout: 0 ms 11:37:57 iceCandidateNominationTimeout: 10000 ms 11:37:57 iceConnectionCheckPollingInterval: 0 ms 11:37:57 pregenerateCertTimerCallback[921], [B] try lock 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) receiveLwsMessage(): Client received message of type: ICE_CANDIDATE 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) lwsWssCallbackRoutine(): Client is writable 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) receiveLwsMessage(): Client received message of type: ICE_CANDIDATE 11:37:57 signalingMessageReceived[1208], [B1] try lock 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) lwsWssCallbackRoutine(): Client is writable 11:37:57 signalingMessageReceived[1208], [B1] try lock 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) receiveLwsMessage(): Client received message of type: ICE_CANDIDATE 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) lwsWssCallbackRoutine(): Client is writable 11:37:57 signalingMessageReceived[1208], [B1] try lock 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) receiveLwsMessage(): Client received message of type: ICE_CANDIDATE 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) lwsWssCallbackRoutine(): Client is writable 11:37:57 signalingMessageReceived[1208], [B1] try lock 11:37:57 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) initializePeerConnection(): time taken to create peer connection 486 ms 11:37:57 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:57 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:57 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:58 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:58 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:58 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) stepStateMachine(): State Machine - Current state: 0x0000000000000001, Next state: 0x0000000000000002 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) iceAgentCheckConnectionStateSetup(): ice candidate pair count 0 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_NONE to ICE_AGENT_STATE_CHECK_CONNECTION. 11:37:58 2021-03-16 03:37:52 INFO (thread-0x73b364d0) onConnectionStateChange(): New connection state 2 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:58 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) createSocket(): setsockopt() failed with errno Protocol not available 11:37:58 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:58 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:58 timerQueueAddTimer[66] pTimerQueue = 0x9359c4, [A1] try lock 11:37:58 timerQueueAddTimer[69] pTimerQueue = 0x9359c4, [A1] locked 11:37:58 2021-03-16 03:37:52 INFO (thread-0x73bb64d0) signalingClientSendMessageSync(): Signaling Client Sending Message Sync 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) sendLwsMessage(): Sending data over web socket: Message type: SDP_ANSWER, RecepientId: u0zps71d1o1yd34ur1zr2 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) turnConnectionStepState(): TurnConnection state changed from TURN_STATE_NEW to TURN_STATE_CHECK_SOCKET_CONNECTION 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) turnConnectionStepState(): TurnConnection state changed from TURN_STATE_NEW to TURN_STATE_CHECK_SOCKET_CONNECTION 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73b364d0) iceAgentLogNewCandidate(): New local ice candidate discovered. Id: HI8C3UB2o. Ip: 2001:0db8:0000:f101:0000:0000:0000:000b:37229. Type: host. Protocol: UNKNOWN. 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73c364d0) lwsWssCallbackRoutine(): Client is writable 11:37:58 2021-03-16 03:37:52 DEBUG (thread-0x73bb64d0) handleOffer(): time taken to send answer 43 ms 11:37:58 handleOffer[213] 11:37:58 timerQueueAddTimer[66] pTimerQueue = 0x8b1534, [A1] try lock

hassanctech commented 3 years ago

Thanks for reporting this, will take a look soon!

shiv50084 commented 3 years ago

@Nomidia same issue i am facing how you fixed this. on my side it easily reproducible

Nomidia commented 3 years ago

@Nomidia same issue i am facing how you fixed this. on my side it easily reproducible

I add another handle for sdp offer currently.

--- a/samples/Samples.h
+++ b/samples/Samples.h
@@ -74,6 +74,7 @@ typedef struct {
     UINT32 videoBufferSize;
     TID mediaSenderTid;
     TIMER_QUEUE_HANDLE timerQueueHandle;
+    TIMER_QUEUE_HANDLE timerSdpHandle;
     UINT32 iceCandidatePairStatsTimerId;
     SampleStreamingMediaType mediaType;
     startRoutine audioSource;
MushMal commented 3 years ago

Please check the provided fix which is what I would go with. Comments are much appreciated - especially from @Nomidia

disa6302 commented 3 years ago

@Nomidia ,

I am closing this ticket since the fix is merged. Feel free to re-open if you still face this issue.