awslabs / amazon-kinesis-video-streams-webrtc-sdk-c

Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/group__PublicMemberFunctions.html
Apache License 2.0
1.01k stars 304 forks source link

[QUESTION] Potential SSL Read Error Again #1000

Closed suggestedfixes closed 3 years ago

suggestedfixes commented 3 years ago

We are also experiencing a similar problem when the bitrate gets too high (a moderate 1Mbps) for some network scenarios, actually very good network conditions. However, the fix for this issue is to default to use udp turn, which does not address the root cause of an SSL error. https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/issues/410

We've also been able to replicate similar error log regarding SSL read, except this happens at very fast network conditions. https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/issues/851

There are traces of evidence for potential data races

suggestedfixes commented 3 years ago

I threaten with a friendly smile if anyone closes this issue.

MushMal commented 3 years ago

@suggestedfixes I am not sure what you are observing and what's causing SSL error. This is a very generic error that can be caused by many possible root causes. You will need to have some debug level logs to start pinning it down.

If you have a repro scenario with stock assets we can look at it with highest priority. Other than that, I am not sure if any of this issue is actionable and I would recommend closing this as we don't want to keep non-actionable issues open.

To answer some of your questions.

suggestedfixes commented 3 years ago

I will attempt to grab some verbose log leading up to SSL read error (tracking down the sequence of events that lead up to this error is still a bit a huge challenge at my skill level, but I don't mind give that a shot, but I will be struggling)

MushMal commented 3 years ago

It's hard to think of the case where the UDP on 443 would be disabled while the TCP allowed on the same port. Firewalls usually operate at the packet level but perhaps some scenarios call for it. If his is the case then you will need to filter the udp urls out in Common.c when submitting the URLs into the PeerConnection object: https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/master/samples/Common.c#L363

Not sure how to help you but one thing would be useful for you is to be able to always run stock sample applications to get the repro so we can be on the same page. We can't debug your own applicaition.

suggestedfixes commented 3 years ago

I think I found a pattern! DTLS has an sslLock protecting ssl read/write, whereas TLS does not!

MushMal commented 3 years ago

Not entirely sure what you mean. The sending/receiving is interlocked by a higher-level lock. Do you have a theory of what's causing your ssl read failures?

suggestedfixes commented 3 years ago

When there are multiple TCP Turn candidate pairs during ICE negotiation or in the process of initial phase of streaming and just before teardown, there might be a few races where multiple TCP connections are calling SSL_read at the same time, the odds of this is very small with lower bitrate. Increasing the bitrate increases the probability of races where two SSL_read are being called at the same time.

suggestedfixes commented 3 years ago

In particular, a single tcp socket from master side might be serving multiple candidates from the viewer side at the same moment.

suggestedfixes commented 3 years ago

From the preliminary logging, there are more SSL writes than SSL reads. So could SSL read and write interfere with each other?

suggestedfixes commented 3 years ago

Preliminary steps to reproduce on armv7, will try if this is reproducible on Mac. Master C SDK side:

Browser side:

MushMal commented 3 years ago

All of these are interlocked properly - or at least this is the idea. Note that the TCP connection will call the same iceUtilsSendData which then ends up with socketConnectionSendData which interlocks at the connection level.

MUTEX_LOCK(pSocketConnection->lock);

So, sending is interlocked.

On the receiving side, socketConnectionReadData also interlocks on this lock.

suggestedfixes commented 3 years ago

@MushMal Is there a case where multiple socket connections are using the same SSL_read at the same time? For example, there are two socket connections

Since they have different locks, they are free to call SSL_read at the same time.

suggestedfixes commented 3 years ago

@MushMal My second thoughts, different sockets might have different SSL contexts

MushMal commented 3 years ago

Indeed, these are different sockets so the mutex is per socket - each with its own socket connection object

MushMal commented 3 years ago

Any updates?

suggestedfixes commented 3 years ago

Took Friday off. Last Thursday, I was

So I should have something today.

suggestedfixes commented 3 years ago

Also trying to understand this article: https://www.openssl.org/docs/man1.0.2/man3/threads.html

suggestedfixes commented 3 years ago

yikes ...

 (Note that OpenSSL uses a number of global data structures that will be implicitly shared whenever multiple threads use OpenSSL.) Multi-threaded applications will crash at random if it is not set.
suggestedfixes commented 3 years ago

My thoughts, could one reason be, could the read buffer be not big enough.

MushMal commented 3 years ago

@suggestedfixes I just added an experiment in the common library. https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/pull/151

Can you see if this makes any difference for your crashes?

You need to apply the common library changes. Also, call initializeSslCallbacks from the start of your app and releaseSslCallbacks at the end. See if this makes any difference - I seriously doubt it

suggestedfixes commented 3 years ago

@MushMal This looks amazing! Thanks!

suggestedfixes commented 3 years ago

This cannot be contained with an extra timeout or more retries, correct?

2020-12-15 06:50:57 INFO    onConnectionStateChange(): New connection state 3
2020-12-15 06:50:57 DEBUG   rtcPeerConnectionGetMetrics(): ICE local candidate Stats requested at 16080150572625975
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate IP Address: 3.21.52.195
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate type: relay
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate port: 59368
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate priority: 16777215
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate transport protocol: transport=tcp
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate relay protocol: transport=tcp
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate Ice server source: 3-21-52-195.t-c01cb9af.kinesisvideo.us-east-2.amazonaws.com
2020-12-15 06:50:57 DEBUG   rtcPeerConnectionGetMetrics(): ICE remote candidate Stats requested at 16080150572627573
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate IP Address: 3.14.135.85
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate type: relay
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate port: 60126
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate priority: 0
2020-12-15 06:50:57 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate transport protocol: transport=udp
2020-12-15 06:50:57 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000004, Next state: 0x0000000000000008
2020-12-15 06:50:57 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_CONNECTED to ICE_AGENT_STATE_NOMINATING.
2020-12-15 06:50:57 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.23.103.89, port: 60107, channel number 16386
2020-12-15 06:50:57 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000008, Next state: 0x0000000000000010
2020-12-15 06:50:57 DEBUG   iceAgentReadyStateSetup(): Selected pair i+mMePHUb_WF8t3dg9M, local candidate type: relay. Round trip time 0 ms
2020-12-15 06:50:57 DEBUG   iceAgentReadyStateSetup(): Freeing Turn allocations that are not selected. Total turn allocation count 1
2020-12-15 06:50:57 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_NOMINATING to ICE_AGENT_STATE_READY.
2020-12-15 06:50:58 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_CREATE_PERMISSION to TURN_STATE_BIND_CHANNEL
2020-12-15 06:50:58 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_BIND_CHANNEL to TURN_STATE_READY
2020-12-15 06:50:58 DEBUG   dtlsSessionChangeState(): DTLS init completed. Time taken 1301 ms
2020-12-15 06:50:58 INFO    onSctpInboundPacket(): Unhandled PPID on incoming SCTP message 0
2020-12-15 06:50:58 DEBUG   onDataChannel(): New DataChannel has been opened kvsDataChannel

2020-12-15 06:50:58 DEBUG   onDataChannelMessage(): MESSAGE

2020-12-15 06:50:59 DEBUG   sendClientInitiatedDataChannelMessage(): Sending the client the following message: 1s
2020-12-15 06:50:59 DEBUG   socketSendDataWithRetry(): sendto() failed with errno Connection reset by peer
2020-12-15 06:50:59 DEBUG   socketSendDataWithRetry(): Close socket 34
2020-12-15 06:50:59 DEBUG   socketSendDataWithRetry(): Failed to send data. Bytes sent 0. Data len 1257. Retry count 0
2020-12-15 06:50:59 DEBUG   socketSendDataWithRetry(): Warning: Send data failed with 0x5800001a
2020-12-15 06:50:59 ERROR   iceUtilsSendData(): operation returned status code: 0x5800001a
2020-12-15 06:50:59 WARN    tlsSessionProcessPacket(): SSL_read failed with error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2020-12-15 06:50:59 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x5800001a
2020-12-15 06:50:59 DEBUG   socketConnectionSendData(): Warning: Failed to send data. Socket closed already
2020-12-15 06:50:59 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-15 06:50:59 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x58000022
suggestedfixes commented 3 years ago

Wait a sec, I will comment out the code that sends messages.

suggestedfixes commented 3 years ago

Eliminating datachannel

020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): create permission succeeded for peer 3.14.135.85
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): create permission succeeded for peer 3.14.135.85
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): create permission succeeded for peer 18.188.4.190
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): create permission succeeded for peer 3.14.135.85
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): create permission succeeded for peer 18.188.4.190
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 52038, channel number 16385
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 59140, channel number 16386
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 18.188.4.190, port: 55749, channel number 16387
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 64669, channel number 16388
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 54823, channel number 16389
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 57108, channel number 16390
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 18.188.4.190, port: 55480, channel number 16391
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 3.14.135.85, port: 57980, channel number 16392
2020-12-15 07:12:32 DEBUG   turnConnectionHandleStun(): Channel bind succeeded with peer 18.188.4.190, port: 57234, channel number 16393
2020-12-15 07:12:32 DEBUG   handleStunPacket(): received candidate with USE_CANDIDATE flag, local candidate type relay.
2020-12-15 07:12:32 DEBUG   handleStunPacket(): Ice candidate pair GalcwaAeP_4WZNm76N2 is connected. Round trip time: 179ms
2020-12-15 07:12:32 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000002, Next state: 0x0000000000000004
2020-12-15 07:12:32 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_CHECK_CONNECTION to ICE_AGENT_STATE_CONNECTED.
2020-12-15 07:12:32 INFO    onConnectionStateChange(): New connection state 3
2020-12-15 07:12:32 DEBUG   rtcPeerConnectionGetMetrics(): ICE local candidate Stats requested at 16080163527496688
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate IP Address: 3.14.135.85
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate type: relay
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate port: 56464
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate priority: 16777215
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate transport protocol: transport=tcp
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate relay protocol: transport=tcp
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate Ice server source: 3-14-135-85.t-c01cb9af.kinesisvideo.us-east-2.amazonaws.com
2020-12-15 07:12:32 DEBUG   rtcPeerConnectionGetMetrics(): ICE remote candidate Stats requested at 16080163527574506
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate IP Address: 3.14.135.85
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate type: relay
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate port: 52038
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate priority: 0
2020-12-15 07:12:32 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate transport protocol: transport=udp
2020-12-15 07:12:32 DEBUG   handleStunPacket(): Ice candidate pair GalcwaAeP_dY/HCSoya is connected. Round trip time: 180ms
2020-12-15 07:12:32 DEBUG   handleStunPacket(): Ice candidate pair GalcwaAeP_gKp92Zv2V is connected. Round trip time: 187ms
2020-12-15 07:12:32 DEBUG   handleStunPacket(): Ice candidate pair GalcwaAeP_kvGHCiDTd is connected. Round trip time: 182ms
2020-12-15 07:12:32 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000004, Next state: 0x0000000000000008
2020-12-15 07:12:32 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_CONNECTED to ICE_AGENT_STATE_NOMINATING.
2020-12-15 07:12:32 DEBUG   handleStunPacket(): received candidate with USE_CANDIDATE flag, local candidate type relay.
2020-12-15 07:12:32 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000008, Next state: 0x0000000000000010
2020-12-15 07:12:32 DEBUG   iceAgentReadyStateSetup(): Selected pair GalcwaAeP_gKp92Zv2V, local candidate type: relay. Round trip time 0 ms
2020-12-15 07:12:32 DEBUG   iceAgentReadyStateSetup(): Freeing Turn allocations that are not selected. Total turn allocation count 1
2020-12-15 07:12:32 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_NOMINATING to ICE_AGENT_STATE_READY.
2020-12-15 07:12:32 DEBUG   handleStunPacket(): Ice candidate pair GalcwaAeP_PfYAKwreP is connected. Round trip time: 177ms
2020-12-15 07:12:33 DEBUG   dtlsSessionChangeState(): DTLS init completed. Time taken 700 ms
2020-12-15 07:12:33 WARN    tlsSessionProcessPacket(): SSL_read failed with error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2020-12-15 07:12:33 DEBUG   socketConnectionClosed(): Close socket 34
2020-12-15 07:12:33 DEBUG   connectionListenerReceiveDataRoutine(): recvfrom() failed with errno Connection reset by peer for socket 34
2020-12-15 07:12:33 DEBUG   socketConnectionSendData(): Warning: Failed to send data. Socket closed already
2020-12-15 07:12:33 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-15 07:12:33 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x58000022
MushMal commented 3 years ago

I have not seen this issue before

2020-12-15 07:12:33 WARN tlsSessionProcessPacket(): SSL_read failed with error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac

Looks like an intermediary has changed it's mac address or something while trying to establish a network connection.

I am drawing a blank at this stage a I don't have too much detail into OpenSSL implementation of the TLS

suggestedfixes commented 3 years ago

The record mac address is likely getting corrupted with data races? There is a correlation between the bitrate and the error.

suggestedfixes commented 3 years ago

Another possibility is SSL version negotiation. Am I using an up to date root cert or TLSv2 instead of v3? Or my generated cert bits does not align with the relay server?

MushMal commented 3 years ago

I believe it’s not related to the protocol itself as there would be a simple negotiation error. This error in particular could happen when the peer max address is changed - could be the devices max address or the immediate peer like wifi router gateway (not sure). It might, i guess be caused by a memory corruption. Harder to think of a scenario where this could be a network bits corruption.

To simplify:

Look for OpenSSL forums for more info on this particular issue.

suggestedfixes commented 3 years ago

Experimenting with embedTLS and wider/narrower system port range

MushMal commented 3 years ago

If I might suggest, you need a methodic and systematic approach to your investigation. I have given the above suggestions multiple times but for some reason no action is followed. As I mentioned, I have doubts that this has anything to do with the SSL implementation. If you are able to reproduce this issue on stock sample then we can be of more help. Otherwise, we suggest closing this issue as this is not actionable at all on our side - moreover, every indication is that this is your application/platform specific.

suggestedfixes commented 3 years ago

That's a really valuable suggestion. I will try the stock example. By luck I've narrowed it down to port range/firewall settings. But to get a better understanding, I will run the stock with debug, restricting tcp traffic only.

suggestedfixes commented 3 years ago

With the latest commit and a few modifications to gstreamer sample, I will create a branch

2020-12-15 21:36:08 DEBUG   handleStunPacket(): Ice candidate pair MXoVUkOhr_mzgV4Q8WT is connected. Round trip time: 178ms
2020-12-15 21:36:08 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000008, Next state: 0x0000000000000010
2020-12-15 21:36:08 DEBUG   iceAgentReadyStateSetup(): Selected pair elCzM8l7l_4kbKlc4DY, local candidate type: relay. Round trip time 0 ms
2020-12-15 21:36:08 DEBUG   iceAgentReadyStateSetup(): Freeing Turn allocations that are not selected. Total turn allocation count 2
2020-12-15 21:36:08 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_NOMINATING to ICE_AGENT_STATE_READY.
2020-12-15 21:36:08 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_CREATE_PERMISSION to TURN_STATE_CLEAN_UP
2020-12-15 21:36:08 WARN    handleStunPacket(): Cannot find candidate pair with local candidate 0000:0000:0000:0000:0000:0000:0000:0000 and remote candidate 3.23.103.89. Dropping STUN binding success response
2020-12-15 21:36:08 DEBUG   turnConnectionHandleStun(): TURN Allocation freed.
2020-12-15 21:36:08 DEBUG   socketConnectionClosed(): Close socket 40
2020-12-15 21:36:08 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_CLEAN_UP to TURN_STATE_NEW
2020-12-15 21:36:08 INFO    onConnectionStateChange(): New connection state 3
2020-12-15 21:36:08 DEBUG   rtcPeerConnectionGetMetrics(): ICE local candidate Stats requested at 16080681688244864
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate IP Address: 3.134.76.195
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate type: relay
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate port: 62306
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate priority: 16777215
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate transport protocol: transport=tcp
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate relay protocol: transport=tcp
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate Ice server source: 3-134-76-195.t-c01cb9af.kinesisvideo.us-east-2.amazonaws.com
2020-12-15 21:36:08 DEBUG   rtcPeerConnectionGetMetrics(): ICE remote candidate Stats requested at 16080681688310106
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate IP Address: 18.188.4.190
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate type: relay
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate port: 55408
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate priority: 0
2020-12-15 21:36:08 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate transport protocol: transport=udp
2020-12-15 21:36:08 DEBUG   dtlsSessionChangeState(): DTLS init completed. Time taken 704 ms
2020-12-15 21:36:09 INFO    onSctpInboundPacket(): Unhandled PPID on incoming SCTP message 0
2020-12-15 21:36:09 INFO    onDataChannel(): New DataChannel has been opened kvsDataChannel

2020-12-15 21:36:09 INFO    onDataChannelMessage(): DataChannel String Message: mainstream

2020-12-15 21:36:09 WARN    tlsSessionProcessPacket(): SSL_read failed with error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2020-12-15 21:36:09 DEBUG   socketConnectionClosed(): Close socket 41
2020-12-15 21:36:09 DEBUG   connectionListenerReceiveDataRoutine(): recvfrom() failed with errno Connection reset by peer for socket 41
2020-12-15 21:36:09 DEBUG   socketConnectionSendData(): Warning: Failed to send data. Socket closed already
2020-12-15 21:36:09 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-15 21:36:09 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x58000022
2020-12-15 21:36:09 ERROR   turnConnectionSendData(): operation returned status code: 0x58000022
2020-12-15 21:36:09 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): iceUtilsSendData failed with 0x58000022
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): IceAgent connection closed unexpectedly
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_CREATE_PERMISSION to TURN_STATE_BIND_CHANNEL
2020-12-15 21:36:09 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_BIND_CHANNEL to TURN_STATE_READY
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:09 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:10 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-15 21:36:10 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
suggestedfixes commented 3 years ago

Might not be reproducible on common platform

suggestedfixes commented 3 years ago

Maybe sending another one without datachannel message

suggestedfixes commented 3 years ago

My changes to the latest master https://github.com/suggestedfixes/amazon-kinesis-video-streams-webrtc-sdk-c/commit/11f02704a473db172d3a874926425065509f0732

suggestedfixes commented 3 years ago

I will also forward an rtsp stream to a Mac to see if it's reproducible, I tried stock samples on a Mac and did not reproduce.

suggestedfixes commented 3 years ago

I forwarded a camera RTSP stream to a Mac in the same network as the camera, using the code I provided modifying the rstp address. I even cranked up the bit rate to 10Mbps, still cannot reproduce it on Mac. So you were saying platform-specific?

suggestedfixes commented 3 years ago

Random note I came across and should be on my checklist. https://www.openssl.org/docs/man1.1.0/man3/CRYPTO_THREAD_run_once.html

You can find out if OpenSSL was configured with thread support:

 #include <openssl/opensslconf.h>
 #if defined(OPENSSL_THREADS)
   // thread support enabled
 #else
   // no thread support
 #endif

Another note to myself:

Remove data channel related stuff
suggestedfixes commented 3 years ago

Found another pattern! Dtls clears the error before read! Tls does not :) https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/master/src/source/Crypto/Dtls_openssl.c#L453

suggestedfixes commented 3 years ago

Nvm, resetting did not seem to work, no data channel opened

2020-12-16 06:00:53 DEBUG   dtlsSessionChangeState(): DTLS init completed. Time taken 708 ms
2020-12-16 06:00:53 DEBUG   rtcPeerConnectionGetMetrics(): ICE local candidate Stats requested at 16080984533595277
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate IP Address: 3.14.135.85
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate type: relay
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate port: 56127
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate priority: 16777215
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate transport protocol: transport=tcp
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate relay protocol: transport=tcp
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Local Candidate Ice server source: 3-14-135-85.t-c01cb9af.kinesisvideo.us-east-2.amazonaws.com
2020-12-16 06:00:53 DEBUG   rtcPeerConnectionGetMetrics(): ICE remote candidate Stats requested at 16080984533894579
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate IP Address: 3.21.52.195
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate type: relay
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate port: 51945
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate priority: 0
2020-12-16 06:00:53 DEBUG   logSelectedIceCandidatesInformation(): Remote Candidate transport protocol: transport=udp
2020-12-16 06:00:54 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_CREATE_PERMISSION to TURN_STATE_BIND_CHANNEL
2020-12-16 06:00:54 DEBUG   turnConnectionStepState(): TurnConnection state changed from TURN_STATE_BIND_CHANNEL to TURN_STATE_READY
2020-12-16 06:00:57 DEBUG   socketSendDataWithRetry(): sendto() failed with errno Connection reset by peer
2020-12-16 06:00:57 DEBUG   socketSendDataWithRetry(): Close socket 14
2020-12-16 06:00:57 DEBUG   socketSendDataWithRetry(): Failed to send data. Bytes sent 0. Data len 121. Retry count 0
2020-12-16 06:00:57 DEBUG   socketSendDataWithRetry(): Warning: Send data failed with 0x5800001a
2020-12-16 06:00:57 ERROR   iceUtilsSendData(): operation returned status code: 0x5800001a
2020-12-16 06:00:57 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x5800001a
2020-12-16 06:00:57 WARN    tlsSessionProcessPacket(): SSL_read failed with error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2020-12-16 06:00:57 DEBUG   socketConnectionSendData(): Warning: Failed to send data. Socket closed already
2020-12-16 06:00:57 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-16 06:00:57 WARN    turnConnectionSendData(): iceUtilsSendData failed with 0x58000022
2020-12-16 06:00:57 ERROR   turnConnectionSendData(): operation returned status code: 0x58000022
2020-12-16 06:00:57 ERROR   iceUtilsSendData(): operation returned status code: 0x58000022
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): iceUtilsSendData failed with 0x58000022
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): IceAgent connection closed unexpectedly
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 WARN    iceAgentSendPacket(): Invalid state for data sending candidate pair.
2020-12-16 06:00:57 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000010, Next state: 0x0000000000000040
2020-12-16 06:00:57 ERROR   executeFailedIceAgentState(): IceAgent failed with 0x58000022
2020-12-16 06:00:57 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_READY to ICE_AGENT_STATE_FAILED.
2020-12-16 06:00:57 INFO    onConnectionStateChange(): New connection state 5
2020-12-16 06:00:57 DEBUG   freeSampleStreamingSession(): Freeing streaming session with peer id: 4Z1PGSM1MIW
suggestedfixes commented 3 years ago

Should both relays have matching protocols udp/tcp?

MushMal commented 3 years ago

Sorry for the delay, I am oof.

re: openssl_threads.. the pr I sent has an ifdef to enable callbacks or not.

re udp/tcp. Consider each peer separately when they are connected via TURN. The server can handle on one side ups on the other tcp.

try to focus on what’s different between the platforms that you can’t repro on a commodity os

try to run the stock application on your platform per my earlier suggestion.

I assume for the issue we are talking about the ssl errors

suggestedfixes commented 3 years ago

Very naive question, could those be too small: net.core.rmem_default = 163840 net.core.rmem_max = 163840

suggestedfixes commented 3 years ago

My sort of fuzzy logic was, this is bitrate related, one possibility could be that it overwhelmed the read buffer.

I will stop being annoying for the rest of today...

MushMal commented 3 years ago

Oh no worries, it’s just that we can’t make any forward progress and I am not even sure yet whether it’s your platform or your application related.

I can’t comment really on the net settings but those should work without modifications.

Try to gather more data from the runs and run different combinations taking notes. Try to make parallels between the two other issues with turn connection and max address issue. I am running out of ideas at this stage so perhaps your platform guys could chime in?

suggestedfixes commented 3 years ago

With the latest master code

suggestedfixes commented 3 years ago

Actually, this seems to work. https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/issues/1007 Need more testing

MushMal commented 3 years ago

Will take a look at it as soon as I can

staowyze commented 3 years ago

MAC is Message Authentication Code in the ssl error message "bad record mac", yup, it's confusing since we usually interpret MAC as NIC MAC address. SSL complains the data integrity, bug #1007 caused this issue on my device since when retry happens, garbage data would be sent to the TLS session, of course SSL data integrity check fails. This only happens on the devices, not laptops, one thought is the device is less powerful, cannot send the TCP data fast enough till reaching the socket buffer limits, so retry does happen more often, especially for I frames in high bitrate video stream.

MushMal commented 3 years ago

This might really be the issue. I have sent a PR for this issue: https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/pull/1008

We will be merging this soon