awslabs / amazon-kinesis-video-streams-webrtc-sdk-c

Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/group__PublicMemberFunctions.html
Apache License 2.0
1.04k stars 313 forks source link

[QUESTION] Need help understanding this code block #899

Closed suggestedfixes closed 4 years ago

suggestedfixes commented 4 years ago

This is a continuation of one of the unsolved mysteries (https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/issues/583)

src/source/Ice/IceAgent.c Seems like this block of code is often doing more harm than good. Whenever I see "local candidate ip address does not match with xor mapped address in binding response", I know for sure I'm not able to connect.

            if (!isSameIpAddress(&pStunAttributeAddress->address, &pIceCandidatePair->local->ipAddress, FALSE)) {
                // this can happen for host and server reflexive candidates. If the peer
                // is in the same subnet, server reflexive candidate's binding response's xor mapped ip address will be
                // the host candidate ip address. In this case we will ignore the packet since the host candidate will
                // be getting its own response for the connection check.
                DLOGD("local candidate ip address does not match with xor mapped address in binding response");

                // we have a peer reflexive local candidate
                CHK_STATUS(iceAgentCheckPeerReflexiveCandidate(pIceAgent, &pStunAttributeAddress->address, pIceCandidatePair->local->priority, FALSE,
                                                               pSocketConnection));

                CHK(FALSE, retStatus);
            }
MushMal commented 4 years ago

This block of code checks the sent IP and if it doesn't match then ICE agent attempts to check the reflexive candidate (due to possible NAT translation so it uses STUN service to get the reflexive IP).

If the iceAgentCheckPeerReflexiveCandidate fails (which it shouldn't) then we error exit this API.

If it succeeds, the construct CHK(FALSE, retStatus); will ensure an early SUCCESSFUL termination.

Perhaps, when you see this message there is an issue with NAT in your network. What I don't understand is why the TURN doesn't work?

Can I ask you to cut a new issue with a lot more details about your scenario, whether you are using TURN and whether trickle ICE is enabled. What's the version of the software, what's the peer, etc..

Please resolve this issue if you have no further questions.

suggestedfixes commented 4 years ago

From STUN protocol, the xor mapped address is often obfuscated, whereas (pIceCandidatePair->local->ipAddress) is plaintext, so is this still a valid check?

suggestedfixes commented 4 years ago

After xor mapped address not match, it's often followed by

2020-10-27 21:14:04 DEBUG   stepStateMachine(): State Machine - Current state: 0x0000000000000002, Next state: 0x0000000000000040
2020-10-27 21:14:04 ERROR   executeFailedIceAgentState(): IceAgent failed with 0x5a00000d
2020-10-27 21:14:04 DEBUG   stepIceAgentStateMachine(): Ice agent state changed from ICE_AGENT_STATE_CHECK_CONNECTION to ICE_AGENT_STATE_FAILED.
lherman-cs commented 4 years ago

@suggestedfixes XOR-MAPPED-ADDRESS is always obfuscated because some NAT servers rewrite MAPPED-ADDRESS when the packets contain the NAT's public IP (you can find more details in the last paragraph from https://tools.ietf.org/html/rfc5389#section-15.2). The idea is simply to trick the NAT servers to not see its public IP in the packets.

From the implementation perspective, pStunAttributeAddress->address should've been deserialized, XORed back to the original IP address:

https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/1add5fbb3d18007bb2cb4488585d3958ad5f68c8/src/source/Stun/Stun.c#L802-L821

MushMal commented 4 years ago

@suggestedfixes how can we help you debug this issue further? Could you isolate the behavior somehow so it would be possible to understand the network conditions, topology, etc..? How can we reproduce the issue you are seeing? Can you clearly state it?

suggestedfixes commented 4 years ago

I'm having a hard time trying to reproduce this, takes about 20 connections sometimes. I'm thinking about tweaking candidate priorities to make this occur more often.

suggestedfixes commented 4 years ago

This looks really dangerous, is this platform safe?

                    pStunAttributeAddress->address.port ^= (UINT16) stunMagicCookie;

                    // Perform the XOR-ing
                    data = *(PUINT32) pStunAttributeAddress->address.address;Sean DuBois, 11 months ago: • Initial commit
                    data ^= stunMagicCookie;
                    *(PUINT32) pStunAttributeAddress->address.address = data;
MushMal commented 4 years ago

Should be OK as stunMagicCookie is endinanness adapted

disa6302 commented 4 years ago

Closing based on label.