Closed mehmetcy84 closed 9 months ago
Hi @mehmetcy84, we have a Windows CI in our repository now. Please refer to the steps here so that you can build with PThreads without modifying anything related to libwebsockets. Can you please check if this issue persists after you use PThreads?
I am not sure if I am following correctly. I haven't disabled pthreads.
Setting this value to 0 -DLWS_WITH_MINIMAL_EXAMPLES=0 // modified in /CMake/Dependencies/libwebsockets-CMakeLists.txt
Would this change affect the usage of pthreads? Isn't this flag just for building some samples?
I see that in the workflow, build_windows_openssl.bat is used instead of build_windows.bat. And this file has additional configuration lines for pthread. Now my confusion about disabling pthread is resolved.
I have reverted all my changes, so that my git head is at kvs master branch SHA-1: 0ae9ec6b80deff9df724555927fb6595a23054f1 again.
I made sure the pthread folder is where it is expected, and ran build_windows_openssl.bat and the build succeeded without any errors.
I ran the kvsWebrtcClientViewer.exe again, and receive audio frames. Then I set the policy to "RELAY ONLY", and the client again cannot make peer connection. So the issue persists. kinesisLogs.txt The logs are attached.
Hi @mehmetcy84, I see the error STATUS_TURN_CONNECTION_ALLOCAITON_FAILED
in your logs. Is us-east-2 geographically closer to you than any other AWS region? Can you check the connection establishment with ICE_TRANSPORT_ALL
instead? The sample application is already configured to use relay candidates with useTurn
in createSampleConfiguration
I am using the credentials and region information my company is currently using, I am not sure if they make use of any other regions but I will check.
In its default configuration, useTURN is true in createSampleConfiguration, and ICE_TRANSPORT_POLICY_ALL is selected. In its default configuration, I can make the webRTC connection, but only because srflx candidates are the ones that succeed all the times. And when my co-worker runs the code, they can never connect because for some reason srflx candidates do not work for them.
We were expecting to see that relay candidates are selected as the last line of defense, as the worst case scenario happens and no host or srflx candidates work. But this does not happen.
So seeing my coworker not being able to connect at all, and me connecting via srflx candidate all the time made me realise: TURN candidates never work for us.
And that's why I wanted to test TURN candidates specifically and selected ICE_TRANSPORT_POLICY_RELAY, to be able to demonstrate that relay candidates do not work.
So, what might be the reason we get STATUS_TURN_CONNECTION_ALLOCAITON_FAILED? Thanks and best regards,
Just checked and verified that regions are not configurable for our project. So I would like to find out if there is a solution for making use of TURN without changing regions.
I have run the sample viewer client exe from our UK office via remote access, and TURN candidates which are of [us-east-#] region do work fine, and WebRTC connection now succeeds.
But what would be the solution to enable access to our server with a fixed region, worldwide via TURN?
@mehmetcy84, can you please help me understand your setup?
STATUS_TURN_CONNECTION_ALLOCATION_FAILED
happens when the turn allocation is tried the max number of times and fails. From your logs it seems like it continues to try with the same candidate and fails the turn state machine:
2023-11-27 10:39:38.528 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.555 VERBOSE turnConnectionStepState(): Updated turn allocation request credential after receiving 401
2023-11-27 10:39:38.555 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_GET_CREDENTIALS to TURN_STATE_ALLOCATION
2023-11-27 10:39:38.589 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.653 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.715 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.740 ERROR turnConnectionStepState(): operation returned status code: 0x5a00002a
I see that a new candidate is available later but I see that it results in a similar failure:
2023-11-27 10:39:38.970 VERBOSE incomingRelayedDataHandler(): Candidate id: fLNl1mV8S
2023-11-27 10:39:38.985 ERROR turnConnectionTimerCallback(): operation returned status code: 0x00000001
2023-11-27 10:39:38.985 ERROR timerQueueExecutor(): operation returned status code: 0x00000001
2023-11-27 10:39:38.985 VERBOSE turnConnectionStepState(): Updated turn allocation request credential after receiving 401
2023-11-27 10:39:38.985 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_GET_CREDENTIALS to TURN_STATE_ALLOCATION
2023-11-27 10:39:39.023 VERBOSE incomingRelayedDataHandler(): Candidate id: fLNl1mV8S
2023-11-27 10:39:39.048 ERROR turnConnectionTimerCallback(): operation returned status code: 0x00000001
2023-11-27 10:39:39.048 ERROR timerQueueExecutor(): operation returned status code: 0x00000001
2023-11-27 10:39:39.093 VERBOSE incomingRelayedDataHandler(): Candidate id: fLNl1mV8S
2023-11-27 10:39:39.109 ERROR turnConnectionTimerCallback(): operation returned status code: 0x00000001
2023-11-27 10:39:39.109 ERROR timerQueueExecutor(): operation returned status code: 0x00000001
2023-11-27 10:39:39.147 VERBOSE incomingRelayedDataHandler(): Candidate id: fLNl1mV8S
2023-11-27 10:39:39.171 ERROR turnConnectionTimerCallback(): operation returned status code: 0x00000001
2023-11-27 10:39:39.171 ERROR timerQueueExecutor(): operation returned status code: 0x00000001
2023-11-27 10:39:39.171 ERROR turnConnectionStepState(): operation returned status code: 0x5a00002a
2023-11-27 10:39:39.171 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_ALLOCATION to TURN_STATE_FAILED
Hello, thanks for the reply.
1.Where is your master?
The WebRTC Test Page Master I am trying to make a WebRTC connection to is located in the UK and/or in Turkiye. Both cases fail.
- Where is your viewer?
The Viewer client is run in Turkiye. (Fails)
3.Are they in the same region?
Both master and client are configured with region us-east-1 or us-east-2. (Our actual aim is to connect to some cameras which makes use of regions us-east-1 and us-east-2 along with eu-west-2. So we have to make sure we are able to connect to us-east-#)
4.Are you located in us-east-2? If not, where are you trying to access us-east-2 from?
Geographically, I am with my failing client located in Turkiye, and making use of access/secret key pair coupled with us-east-1&2 regions.
5.What is the network like on the master and the viewer? Are there any firewalls enabled? If yes, we would recommend turning them off before trying to establish the connection
I will check this, but I thought TURN would be a solution for problematic network cases or firewalls. I will try disabling firewalls and try again.
Ok, so we have discussed the possible impact of my geographical location to describe the failure of TURN connection.
Here is a contradictary observation to note: I have a set of Access/Secret Key and my region: us-east-1
I use the same Access/Secret Key and the same region:us-east-1
So does this not eliminate the possible effect of geographical location?
P.S. I have now my firewall disabled. On my PC, I run the KVS WebRTC Test Page running as master (TURN ONLY) on my browser and the kvsWebRTCClientViewer.exe (forced to work with TURN ONLY) on the same PC. And the WebRTC connection still fails.
Ok, so we have discussed the possible impact of my geographical location to describe the failure of TURN connection.
Here is a contradictary observation to note: I have a set of Access/Secret Key and my region: us-east-1
- I have a PC in UK. I run the KVS WebRTC Test Page as Master. I choose TURN ONLY.
- I have a PC in Turkiye. I run the KVS WebRTC Test Page as Viewer. I choose TURN ONLY. -> Here the WebRTC connection succeeds, and via chrome://webrtc-internals, I verify that the succeeded connection uses relay candidates for both ends.
I use the same Access/Secret Key and the same region:us-east-1
- I have the same PC in UK. I run the KVS WebRTC Test Page as Master. I choose TURN ONLY.
- I now instead run the unmodified Kinesis C SDK Sample viewer client exe. ( the only modification being the forcing TURN ONLY in samples/common.c) -> WebRTC connection fails, and I receive those logs where TURN connection is being reported to fail.
So does this not eliminate the possible effect of geographical location?
@niyatim23 Can you think of anyway to verify why WebRTC Test Page viewer might be able make the TURN connection with the same credentials/region info, but not the C SDK sample viewer?
Hi @mehmetcy84,
Looking at the following excerpt from your logs, I see the first three incomingRelayedDataHandler
attempts for a candidate fail, but the fourth one is successful as logged by turnConnectionHandleStun
.
2023-11-27 10:39:38.555 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_GET_CREDENTIALS to TURN_STATE_ALLOCATION
2023-11-27 10:39:38.589 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.653 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.715 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.740 ERROR turnConnectionStepState(): operation returned status code: 0x5a00002a
2023-11-27 10:39:38.740 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_ALLOCATION to TURN_STATE_FAILED
2023-11-27 10:39:38.778 VERBOSE incomingRelayedDataHandler(): Candidate id: 4+0Mpjz6i
2023-11-27 10:39:38.778 DEBUG turnConnectionHandleStun(): TURN Allocation succeeded. Life time: 600 seconds. Allocation expiration epoch 170108217877807
2023-11-27 10:39:38.802 WARN turnConnectionStepState(): TurnConnection in TURN_STATE_FAILED due to 0x5a00002a. Aborting TurnConnection
2023-11-27 10:39:38.802 DEBUG turnConnectionStepState(): TurnConnection state changed from TURN_STATE_FAILED to TURN_STATE_CLEAN_UP
The turnConnectionStepState
invocations check on an atomic boolean hasAllocation
to determine whether the allocation was successful. If unsuccessful, the state will not change from TURN_STATE_ALLOCATION. The calling of turnConnectionStepState
is handled by a timer on a 50ms interval, so we check for this hasAllocation
every 50ms, but only up to DEFAULT_TURN_ALLOCATION_MAX_TRY_COUNT, which is currently set to 3. Looking at the timestamps associated with incomingRelayedDataHandler
, we see it takes more than 50ms in between each of the four calls, so that's why it never gets to handle the success case on the fourth invocation as it has surpassed the 150ms "time out".
Can you please try to set the DEFAULT_TURN_ALLOCATION_MAX_TRY_COUNT to a higher number here, say 10 for now in the below line in turnConnection.h and see if that succeeds past the allocation state?
#define DEFAULT_TURN_ALLOCATION_MAX_TRY_COUNT 3
Furthermore, I'd also like to investigate why it fails the first three times. That hasAllocation
boolean is only ever set to TRUE upon turnConnectionHandleStun
receiving a successful allocation response; turnConnectionHandleStun
gets invoked by turnConnectionIncomingDataHandler
. We know turnConnectionIncomingDataHandler
gets invoked all four times since the "Candidate id: 4+0Mpjz6i" line gets logged just before its invocation. Can we add logs into turnConnectionIncomingDataHandler
, especially just before the turnConnectionHandleStun
invocation here to confirm turnConnectionHandleStun
is invoked? And within turnConnectionHandleStun
, let's add logs to understand why we don't get to this line which sets hasAllocation
to true.
Hi @stefankiesz, Thanks for the reply. I will be testing with what you have suggested and return with an answer soon. Best regards,
H @stefankiesz,
Currently I am occupied with some other things to attend, but I quickly gave it a try to make the following change:
#define DEFAULT_TURN_ALLOCATION_MAX_TRY_COUNT 10
This didn't resolve the issue.
I am planning to provide logs and more details, and will capture the additional logs you had asked for.
I will be able to return with these logs in a week. Is it ok if the the issue is kept open until I find the time to respond again with more detail, as I am not authorized to re-open a closed issue with a new comment, and I do not wish to lose this thread. Best regards,
We will keep this issue open.
I found this issues too.
Thanks for keeping this issue open.
Hi @stefankiesz, I have added the logs you had asked for.
Btw, I have updated my branch to latest master 1.9.1 (commit id: bfa6667e2f2eeb800a0edd6e6e4745b4faf34536)
In the following tests, the only changes in the code are debug prints and forcing of RELAY_ONLY candidates. In src/common.c initializePeerConnection function:
// Set the ICE mode explicitly
//configuration.iceTransportPolicy = ICE_TRANSPORT_POLICY_ALL;
configuration.iceTransportPolicy = ICE_TRANSPORT_POLICY_RELAY;
Test case 1: RetryParamDefault3WithLogs.txt
The logs show that both a) At turnconnection.c file, in turnConnectionIncomingDataHandler function, turnConnectionHandleStun is called.
DLOGW("mehmetcy2>> turnConnectionIncomingDataHandler calling turnConnectionHandleStun");
CHK_STATUS(turnConnectionHandleStun(pTurnConnection, pCurrent, processedDataLen));
b) At turnconnection.c file, in turnConnectionHandleStun function, hasAllocation is set to TRUE.
DLOGW("mehmetcy2>> turnConnectionHandleStun-2 case STUN_PACKET_TYPE_ALLOCATE_SUCCESS_RESPONSE setting hasAllocation TRUE");
ATOMIC_STORE_BOOL(&pTurnConnection->hasAllocation, TRUE);
Nevertheless, turn state is declared as TURN_STATE_FAILED.
Test case 2: RetryParamMade10WithLogs.txt
The following change is applied: //#define DEFAULT_TURN_ALLOCATION_MAX_TRY_COUNT 3
Following TURN state changes are observed in the logs: TURN_STATE_NEW -> TURN_STATE_CHECK_SOCKET_CONNECTION TURN_STATE_CHECK_SOCKET_CONNECTION -> TURN_STATE_GET_CREDENTIALS TURN_STATE_GET_CREDENTIALS -> TURN_STATE_ALLOCATION TURN_STATE_ALLOCATION -> TURN_STATE_CREATE_PERMISSION TURN_STATE_CREATE_PERMISSION -> TURN_STATE_BIND_CHANNEL TURN_STATE_BIND_CHANNEL -> TURN_STATE_READY TURN_STATE_READY -> TURN_STATE_CLEAN_UP
And the program exits. Both logs are attached. If you need, I can add more logs to help understand why TURN fails with retry param set to 10. Best regards,
Looking at the logs, it seems with the change, the relay candidates are created successfully, but connectivity checks are failing i.e, no candidate pairs are found within 10 seconds. An observation that is strange:
grep "New remote ice candidate discovered" RetryParamMade10WithLogs.txt
2024-01-19 14:05:11.452 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 192.168.1.104:65450. Type: host. Protocol: udp.
2024-01-19 14:05:11.502 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 192.168.56.1:65451. Type: host. Protocol: udp.
2024-01-19 14:05:11.641 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 78.170.129.117:65450. Type: srflx. Protocol: udp.
2024-01-19 14:05:11.820 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 52.91.222.140:53758. Type: relay. Protocol: udp.
2024-01-19 14:05:11.848 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 3.94.126.20:53871. Type: relay. Protocol: udp.
2024-01-19 14:05:12.267 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 52.91.222.140:56089. Type: relay. Protocol: udp.
2024-01-19 14:05:12.300 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 52.91.222.140:62506. Type: relay. Protocol: udp.
2024-01-19 14:05:12.378 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 3.94.126.20:59005. Type: relay. Protocol: udp.
2024-01-19 14:05:12.380 DEBUG iceAgentLogNewCandidate(): New remote ice candidate discovered. Id: fZ+4XiMkI. Ip: 3.94.126.20:59687. Type: relay. Protocol: udp.
See how all the remote candidates have the same ID? This shouldnt be the case since we generate a random JSON string. Wondering if there is any code modification here because I see the same ID in both the logs files you attached too.
A few questions:
eu-central-1
?Apart from the common.c change:
// Set the ICE mode explicitly
//configuration.iceTransportPolicy = ICE_TRANSPORT_POLICY_ALL;
configuration.iceTransportPolicy = ICE_TRANSPORT_POLICY_RELAY;
and the newly added debug prints, there are no changes at all in the code. It is identical to 1.9.1.
Answers:
Hi @mehmetcy84, I tried ICE_TRANSPORT_POLICY_RELAY
on a Windows EC2 instance with v1.9.1 and it works without any issues for me. I could see the stream on the JS test page with no issues. Also, while trying this, I'm physically located in a region different from the one in which my stream exists
2024-02-06 18:20:13.017 PROFILE iceAgentReadyStateSetup(): Selected pair RkoWQGk0u_fZ+4XiMkI, local candidate type: relay. remote candidate type: relay. Round trip time 265 ms. Local candidate priority: 16777215, ice candidate pair priority: 72057589826734591
Do you have any updates on this issue?
We are using this repo and want to ensure connection stability via usage of TURN candidates for worst case scenario. But TURN candidates never work. If ICE_TRANSPORT_RELAY is chosen in samples/Common.cpp instead of ICE_TRANSPORT_ALL, connection is never established.
We have been using Master 1.7.3. And in some of our networks, webtrc connections are achieved while in others it is not. In order to ensure maximum stability, we would like to make sure TURN candidates are used for the last line of defense. But even in our networks where STUN works, TURN candidates never work. We have observed this on Windows builds as well as cross compiled Android builds. Updated our repos to the latest master 1.9, but the problem persists.
In order to ensure we didn't introduce any artifacts, I have cloned the 1.9 master, only applied the following two changes to avoid build errors, other than that the following logs are captured on vanilla kvs sample (kvsWebRTCClientViewer.exe) built for Windows run against the KVS WebRTC Test Page.
Steps to reproduce:
Expected Result: "Audio Frame received" prints displayed. Actual Result: "Audio Frame received" prints displayed.
Expected Result: "Audio Frame received" prints displayed. Actual Result: Frames are not received, and the console is flooded with TURN error messages.
The following logs are captured for "VERBOSE" log setting. Sensitive info censored. For clarity additional logs are manually added (ones which start with "mehmetcy") in turnConnectionTimerCallback function, where the errors are seen. logs.txt