element-hq / element-android

A Matrix collaboration client for Android.
https://element.io/
GNU Affero General Public License v3.0
3.34k stars 709 forks source link

turn-uri changes on matrix-server are not applied immediately #2716

Open HeikoBoettger opened 3 years ago

HeikoBoettger commented 3 years ago

Describe the bug While investigating connectivity issue to my coturn server I discovered that changes on the turn settings in my syampse-server aren't pickup by element. I could clearly see the red message that it indeed recognized the connection to the matrix-server was interrupted by the required restart to apply the setting to the synapse server. Is this something cached in element on client-side?

This makes an analysis really difficult because I for example installed a fresh turnserver to make sure it's not an issue with my server configuration and was wondering why the turnserver wasn´t receiving any connection. Of course I was first double checking the firewall but than when I run tcpdump on both servers old and new I discovered that element is still connecting to the previously configured turnserver.

To Reproduce

  1. setup two turn servers with different ip
  2. run tcpdump for udp packages on port 3478 on both servers
  3. run element on two devices connect to two different ISPs (had element android and element windows desktop)
  4. establish a call -> working
  5. change the turn-uri in synapse
  6. restart synapse
  7. try to establish a call -> not working, tcpdump on old server still reports udp-packages
  8. restart the devices
  9. establish a call -> working

Expected behavior Query latest turn-uris for matrix after being reconnected to then matrix-server or when turn-server doesn´t answer.

Screenshots If applicable, add screenshots to help explain your problem.

Smartphone (please complete the following information):

Additional context

This might also apply to element for desktop and element for IOs.

bmarty commented 3 years ago

Turn response are kept in memory by design for their lifetime (ttl). See https://github.com/vector-im/element-android/blob/develop/matrix-sdk-android/src/main/java/org/matrix/android/sdk/internal/session/call/DefaultCallSignalingService.kt#L68

If you want to force a refresh client side, make sure to kill the mobile app first.

bahur142 commented 3 years ago

Hi all I decided to test a bit regarding the info from this ticket. The test is with 2 android devices running the latest Element.

I have 3 diff coturn servers with 3 diff IPs and 3 diff hostnames - turn1.server.com, turn2.server.com, turn3.server.com I have also a CNAME dns record called turn.server.com that points to one of those 3 A records. In my synapse turn-uri always stays this CNAME named turn.server.com and never changes. I am playing only with the DNS record to point diff servers.

The question here is what else Element caches? Because the calls still not working when I change the DNS to point another A record. Actually there is traffic to the new server and Element rings, but just says "Connecting ..." and never establish the call. If I kill the app from the memory and start it again, calls are working. Here, I expect calls to continue working without killing the app.

bahur142 commented 3 years ago

The question here is what else Element caches? Because the calls still not working when I change the DNS to point another A record.

I found it it is shared_secret parameter in turnserver.conf This should be the same for every turn1, turn2, turn3 servers, although the CNAME record is the same. Calls are working now on another turn server without killing the app (nor restarting the synapse) before that.

HeikoBoettger commented 3 years ago

Turn response are kept in memory by design for their lifetime (ttl). See https://github.com/vector-im/element-android/blob/develop/matrix-sdk-android/src/main/java/org/matrix/android/sdk/internal/session/call/DefaultCallSignalingService.kt#L68

If you want to force a refresh client side, make sure to kill the mobile app first.

@bmarty Yes, that's working but even as some body working as a software developer I don't know how you actually properly kill the application. As far as I am aware of you don't need to explicitly start element to receive messages since there is a part which runs in background which is automatically started when you start the phone. I personally don't know what exactly needs to be killed. As a result for the average user it probably means restarting the phone.

The problem I see with this expectation is that the user is usally not the server administrator which means that a user doesn't know about having to restart. I would prefer to have some logic inside element to ask the matrix server for an updated list of the turnservers and shared_secret if it's unable to login into the turnserver.

HeikoBoettger commented 3 years ago

@bahur142 Thank you for looking into this. Is it correct that there is only one shared_secret for all turn servers listed on the server side? The reason I am asking is because I wonder how this works in an environment where multiple server administrators share their turn servers with each other but for security don't want to use the same secret for all servers. This is probably a separate issue.

To give some more context, in my case I initially used the same secret for both servers and just changed the hostname in the synapse configuration. Of course after seeing other issues I tried to see whether switch to less secure credentials using a fixed username and password brings any stability. I didn´t change anything on DNS in my case but I have to say it's awesome you are also considering this case that will make things much easier.

One thing which might also be interessting is to think about what happens if the TTL is reduced. Are timezones correctly handled? I assume everything uses UTC-0 but I observed in the past that expired credentials were sent to the turnserver. This of course might have been a problem due to the caching issue and reconfiguration of the TTL value on the server side.