element-hq / riot-android

A glossy Matrix collaboration client for Android
Apache License 2.0
1.4k stars 394 forks source link

VoIP broken in F-Droid (and PlayStore?) build, TURN credentials not being renewed (was: Riot calls with "Media Connection Failed" error outside LAN?) #1744

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi,

running an own synapse node I've configured COTURN and riot calls work fine within my local network but not if one client is outside my LAN. In this case I get Media Connection Failed.

My config:

# The public URIs of the TURN server to give to clients
#turn_uris: []
#turn_uris: [ "turn:my.ddns.net:3478?transport=udp", "turn:my.ddns.net:3478?transport=tcp" ]
turn_uris: [ "turn:my.ddns.net:5439?transport=udp", "turn:my.ddns.net:5439?transport=tcp" ]

# The shared secret used to compute passwords for the TURN server
#turn_shared_secret: "YOUR_SHARED_SECRET"
turn_shared_secret: "MySharedSecret"

# The Username and password if the TURN server needs them and
# does not use a token
#turn_username: "TURNSERVER_USERNAME"
#turn_password: "TURNSERVER_PASSWORD"

# How long generated TURN credentials last
#turn_user_lifetime: "1h"
turn_user_lifetime: "86400000"

# Whether guests should be allowed to use the TURN server.
# This defaults to True, otherwise VoIP will be unreliable for guests.
# However, it does introduce a slight security risk as it allows users to
# connect to arbitrary endpoints without having first signed up for a
# valid account (e.g. by passing a CAPTCHA).
#turn_allow_guests: True
turn_allow_guests: False

Any ideas?

ara4n commented 7 years ago

Your best bet is to submit bug reports from both the sending and receiving clients and then ping us on #matrix-dev:matrix.org to investigate. Also look at your coturn logs to see if you can see both clients trying to talk TURN to it in order to create the necessary ICE candidates.

jistr commented 7 years ago

I'm seeing similar behavior. It looks like in my case the cause is that one of the clients doesn't even attempt to fetch TURN credentials from the Synapse server. @saljut7 can you check if that's the case for you as well?

In Synapse log, i can see messages for user G requesting GET /_matrix/client/r0/voip/turnServer?a ccess_token=<redacted>. However, no such message is there for user F, regardless of which side tries to initiate the call. It seems that the version from F-Droid doesn't even attempt to fetch the TURN credentials from Synapse.

I can see mentions of both users in the coturn log though. For user G there are success messages like incoming packet SEND processed, success, but for user F it's incoming packet message processed, error 401: Unauthorized. For user F i can also see ERROR: check_stun_auth: Cannot find credentials of user.

It's interesting that client F appears to know what TURN server to use (if its appearance in coturn log proves that? i'm not very familiar with TURN), but doesn't have or even try to request the credentials.

Not sure if some broken caching could be to blame? Is it even possible for the riot client to know the turn server address if it doesn't issue the /_matrix/client/r0/voip/turnServer request at all?

jistr commented 7 years ago

Very interesting, indeed the issue went away when i force-stopped the Riot app on client F, cleaned the app cache, and started it again. Client F issued the turnServer request to Synapse, and calls went fine even when both clients were on mobile network, where we also get non-public IPs.

Interesting is that when i issued another call, only client G re-ran the turnServer request, while client F used the previous credentials. It was still within the credential lifetime, so it worked fine again. I'll try again after the creds expire and post an update.

jistr commented 7 years ago

After the credentials expired, i'm back to the same problem. The F client doesn't issue the turnServer request at all, so i guess it keeps using the old credentials, and VoIP calls don't work again.

So the F-Droid version seems to have some TURN credentials problem (caching them too long / indefinitely?), but the GPlay version seems to work fine.

jistr commented 7 years ago

Please let me know if you want me to collect some more info to help uncover the cause.

ghost commented 7 years ago

@ara4n

Your best bet is to submit bug reports from both the sending and receiving clients and then ping us on #matrix-dev:matrix.org to investigate. Also look at your coturn logs to see if you can see both clients trying to talk TURN to it in order to create the necessary ICE candidates.

Just upgraded to my synapse server to 0.25.1, tried again, same problem (call within LAN works, outside not). I've send a bug report from within riot. I will try to "ping you" on #matrix-dev:matrix.org as you say but for "freetime admins" it is quite complicated to observe a chat all the time. Anyway, thx for your reply! Important: I know that the call feature worked under 0.23.1 or lower (don't remember) cause I had a very long and successfull test call first time I've set up the TURN server.

@jistr

@saljut7 can you check if that's the case for you as well?

Trying again I had working call within my LAN and a not working call from outside my LAN to inside my LAN:

  1. My homeserver.log has no entries like turnServer?a ccess_token=<redacted> in the relevant period of time. Did I look in the right log file?

  2. I wanted to check the coturn log files but I don't have any. I activated log-file=/var/tmp/turn.log in /etc/turnserver.conf, restarted coturn, made a test call but still no log files in /var/tmp/.

jistr commented 7 years ago

@saljut7

  1. Yea i think that's the right file. If you had no turnServer mention in the log at all, i'd try stopping Riot on the phones and clearing the cache of the app, which in my case forced Riot to issue the turnServer request against Synapse. Calls then worked until the TURN credentials expired.

  2. That's strange. I use log-file=stdout because i run in a container... Also btw i have verbose in my coturn config file which i believe should make it produce more info. But even without verbose i'd at least expect the log to get created...

croulibri commented 6 years ago

I might have the same problem...

My config : Server : Synapse 25.0 on Yunohost (package https://github.com/YunoHost-Apps/synapse_ynh) with coturn server setup Riot Android 0.7.03 (from f-droid) Riot desktop 13.1

Since some month (I don't know exactly when it started, but at least 3 months), the call (with or without video) between 2 persons (direct call) is much much less reliable when two android devices call each other. But when there is one Android device and one Riot Desktop, or 2 riot Desktop, the call is made flawlessly. Information given is erreur inconnue : user hangup or impossible de se connecter au média

croulibri commented 6 years ago

@saljut7 did you receive any news from @ara4n regarding this issue ? For the first time since a year I am using Matrix, direct calls are not working any more easily 😞 Only when I force-stop the Riot app, clean the app cache, and start it again, then Riot works and is able to make a direct call. But it is quite annoying for regular calls (and for non geek users). The conference calls through Jitsi are the only ones reliable now. Hope to see the F-droid Riot app corrected soon !

ghost commented 6 years ago

@croulibri no, unfortunately not. But I don't want to stress the matrix dev team with this issue.. AV call is a bonus for me. text chat works very great and much better than my xmpp server so anyway I'm quite happy.

Anyway: tried a call between 2 palystore-riot-users registered on my homeserver and id worked just fine.

@jistr I've reinstalled fdroid-riot today (couldn't find the clear cache option under android6) but still no call to a playstore-riot possible.

jistr commented 6 years ago

@ara4n @ylecollen can you please change the issue title to something more specific like "VoIP broken in F-Droid build, TURN credentials not being renewed" ? I think that at this point there's enough supporting evidence for that in the thread above (mainly that 3 folks face the same symptoms, only f-droid version is affected, and clearing the app cache helps as a temporary workaround). Having precise title might help more granular prioritization when going through P2 bugs perhaps...

ghost commented 6 years ago

@jistr thx, /done

btw: does a reinstallation of an app clear the cache?

jistr commented 6 years ago

@saljut7 Thanks :) Regarding reinstall, I don't know that... But i found a guide how to clear app cache on Android 6 here (don't clear data, just clear cache): https://www.cnet.com/how-to/how-to-clear-app-cache-and-app-data-in-android-6-0-marshmallow/

You should also Force Stop the app before clearing the cache if you want to replicate the workaround more closely. I'm not sure if it's required for the workaround to take effect, but it may be...

jistr commented 6 years ago

@saljut7 i still see the old issue title so maybe it didn't get saved?

ghost commented 6 years ago

...now? Too long this way?

jistr commented 6 years ago

@saljut7 great thanks :) Personally i think the length is ok, admins can shorten it if desired...

ghost commented 6 years ago

@jistr thx. Anyway.. reinstalling the app (which should clear the cache as well) didn't work for me.

ghost commented 6 years ago

Very strange:

Yesterday a call Riot F-Droid -> Riot F-Droid worked fine. A call Riot F-Droid -> Riot PlayStore didn't work, same for Riot PlayStore -> Riot PlayStore.

ara4n commented 6 years ago

I've totally missed this bug - sorry all :( @giomfo, @manuroe - I think this may be impacting more than just Fdroid. If it would be possible to prioritise it on Android maintenance it'd be much appreciated; failure to load the TURN config from the homeserver sounds like quite an obvious thinko to spot.

ghost commented 6 years ago

No reason for "sorry" - Riot is such a great messenger even without AV calls! :)

It is just confusing for people if certain Riot functions don't work... maybe the AV button should only appear if the related account is on a server with working TURN configuration?

Anyway: could someone confirm that my TURN/Matrix configuration is right?

My install protocol for Debian is:

apt install coturn

In /etc/turnserver.conf I've added/enabled/adjusted:

[...]
no-tcp-relay
[...]
denied-peer-ip=10.0.0.0-10.255.255.255
denied-peer-ip=192.168.0.0-192.168.255.255
denied-peer-ip=172.16.0.0-172.31.255.255
[...]
allowed-peer-ip=10.0.0.1
[...]
user-quota=12
[...]
total-quota=1200
[...]
listening-port=3478
[...]
tls-listening-port=5349
[...]
CA-file=/etc/letsencrypt/live/my.ddns.net/privkey.pem
[...]
use-auth-secret
[...]
static-auth-secret=MyStaticTurnAuthSecret
[...]
server-name=my.ddns.net
[...]

Additional firewall ports I've opened for TURN:

In homeserver.yaml I've added/edited:

[...]
turn_uris: [ "turn:my.ddns.net:3478?transport=udp", "turn:my.ddns.net:3478?transport=tcp" ]
[...]
turn_shared_secret: "MyTurnSharedSecret"
[...]
turn_user_lifetime: "86400000"
[...]
turn_allow_guests: False
[...]

Finished setup with:

service coturn start && service coturn status

...and restarting Synapse.

Okay this way?

Just tried different AV connection with this configuration and 2 accounts registered on my homeserver:

Working:

LAN1 -> LAN1 LAN1 -> mobile data connection mobile data connection -> LAN1

Not working:

mobile data connection -> mobile data connection

jistr commented 6 years ago

@saljut7 looks quite good to me except the secret settings in coturn. Among other things i have this in my coturn config:

lt-cred-mech
use-auth-secret
static-auth-secret={{coturn_static_auth_secret}}

I took the info from https://github.com/matrix-org/synapse/blob/master/docs/turn-howto.rst

jistr commented 6 years ago

Also i have a smaller turn user lifetime time in synapse config:

turn_user_lifetime: "1h"

^ reducing the user lifetime (perhaps to even shorter time) could be useful to debug the problems with credentials renewal. /cc @giomfo @manuroe

ghost commented 6 years ago

Thank you, actually this was a mistake in my Tetc/turnserver.conf.

Changed:

[...]
use-auth-secret=MyTurnSharedSecret
[...]

to:

[...]
use-auth-secret
[...]
static-auth-secret=MyStaticTurnAuthSecret
[...]

I've set turn_user_lifetime: "1h" now, thx. In addition I noticed that port 5349 was "UDP only" opened in my firewall. Unfortunately the recent settings didn't change anything. Still no calls from mobile connection to mobile connection possible.

Documentaion says:

Ensure your firewall allows traffic into the TURN server on the ports you've configured it to listen on (remember to allow both TCP and UDP TURN traffic)

...may I ask if UDP is "included" if I open a port for TCP? Using ufw I can only specify tcp or udp opening a port.

jistr commented 6 years ago

@saljut7 I only use firewalld, not ufw, but i doubt that UDP would be included when a port for is opened for TCP. You should be able to create two separate rules though -- one for TCP, and one for UDP, even though they "use" the same port number. This should be valid and the rules shouldn't collide.

And yes, i also have no calls unless i do the "force stop + clear cache" trick.

ghost commented 6 years ago

Hm, okay. "man ufw" says:

ufw supports several different protocols. The following are valid in any rule and enabled when the protocol is not specified: tcp udp

So I have to set 4 rules on my router at least, thx.

ara4n commented 6 years ago

may be: https://github.com/matrix-org/riot-android-rageshakes/issues/2008

ghost commented 6 years ago

may be: matrix-org/riot-android-rageshakes#2008

@ara4n sry but what doy you mean?

croulibri commented 6 years ago

My config : Home Server : Synapse 25.0 on Yunohost (package https://github.com/YunoHost-Apps/synapse_ynh) with coturn server setup Riot Android 0.8.3 (from f-droid)

Since my November's message, I still face regular regular problem when calling from Riot Android. I don't know exactly when it started, but at least from September 2017, calls (with or without video) between 2 persons (direct call) are not reliable when two android devices call each other. But when there is one Android device and one Riot Desktop, or 2 riot Desktop, the call is made flawlessly. Information given is erreur inconnue : user hangup or impossible de se connecter au média

If I clear cache (within Riot Android app) then calls work all the time. So I guess the coturn server and the firewall is not the problem. You imagine it is really bothering to clear cache each time I want to call, and for "non geek" users, it means that they can't call each other any more.

Is there still other people facing this situation ? Is there a workaround ?

jistr commented 6 years ago

Yea i'm also still facing this issue with Riot 0.8.3, clearing cache helps as a workaround still.

Btw the bug is now marked P1 and in a Sprint 8 milestone list:

https://github.com/vector-im/riot-android/milestone/12

ghost commented 6 years ago

If I clear cache (within Riot Android app) then calls work all the time. So I guess the coturn server and the firewall is not the problem.

... clearing cache helps as a workaround still.

I tried both: clearing the cache within Android (Android preferences -> Apps -> Riot -> Storage -> clear cache) and within Riot (Riot -> Settings -> Clear cache) and tried again voice call with 2 F-Droid devices v0.8.5 (F-c0c7146a) which ends up with...

CallingUser placed a voice cal.
CalledUser answered the call.
CalledUser ended the call.

...in both directions.

ghost commented 6 years ago

@jistr

In Synapse log, i can see messages for user G requesting GET /_matrix/client/r0/voip/turnServer?a ccess_token=.

I noticed thet my GET log even don't list the access_token part. My log trying a call to @CalledUser:my.ddns.net looks like this:

2018-05-07 00:12:27,782 - synapse.access.https.8448 - 59 - INFO - GET-1507- 89.203.147.152 - 8448 - Received request: GET /_matrix/client/r0/voip/turnServer
2018-05-07 00:12:27,784 - synapse.access.https.8448 - 93 - INFO - GET-1507- 89.203.147.152 - 8448 - {@CalledUser:my.ddns.net} Processed request: 1ms (0ms, 0ms) (0ms/0ms/0) 162B 200 "GET /_matrix/client/r0/voip/turnServer HTTP/1.1" "Riot.im/0.8.7 (Linux; U; Android 4.4.2; RAZR HD Build/KDA20.11; Flavour FDroid; MatrixAndroidSDK 0.9.3)"

Both calling user and CalledUser are on same matrix homeserver but connecting via mobile data connection.

Coturn reports 346: ERROR: session 000000000000000006: TCP socket error: Connection reset by peer 89.203.147.121:12134 while 89.203.147.121 is the IP of the calling user.

Is there anything else I could have forgot in my config causing this error?

jistr commented 6 years ago

@r4dh4l I'd probably double check that all TLS key/cert paths are correct and valid, but i'm not sure what in particular could cause that error.

ghost commented 6 years ago

(Sry for the OT comment but because I opened this issue I feel responsible to support it until it is solved and therefor I just wanted to say: Unfortunately I won't be able to contribute reports to the issue here anymore. I will close my GitHub account and will move to https://gitlab.com because of https://blog.github.com/2018-06-04-github-microsoft/. I personally don't want to support the https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish strategy in anyway - I hope the Matrix/Riot development won't be affected by the M$-GitHub deal in any way and I try to contribute reports via rageshake if the Matrix team will stay in GitHub. Anyway: Thank you all for the community support here!)

jistr commented 6 years ago

I retested, it's still broken but the behavior changed slightly since we discussed last time.

As a workaround, it's now ok to just restart Riot (via menu -> Exit, then open the app again) to get it to request fresh TURN credentials (no need to clear cache), then calls work fine until the TURN credentials expire again.

The problem still seems to only affect FDroid build, not Google Play build. I have 1 hour period for TURN credentials expiration, and when i run

docker logs synapse_20180713 2>&1 | grep turnServer | grep $USERNAME

i get different results for different clients. User with Google Play client makes the turnServer request much more often (usually about once in 55 minutes) than user which uses FDroid client (looks like it makes turnServer request just on app restart times).

I wonder if we should report this issue at https://github.com/matrix-org/matrix-android-sdk instead, as it might be directly in the Android SDK.

Versions involved:

JasonGitly commented 6 years ago

I confirm that this issue exists. I'm using the PlayStore version and exiting the program doesn't solve the problem.

Edit: Sorry! I was wrong. The problem was that the turnserver was stopped. Everything is working.

jistr commented 6 years ago

Saljut7 no longer has a github account, he reached out to me to post this in his name:

======

Hi, this is the user who originally opened this issue. Because I left GitHub jistr was so kind to post something in my name: When I opened this issue I was not aware that a TURN server behind a dynamic IP needs a special setup (so mi initial problem had nothing to do with a Riot or Synapse bug, just with my Synapse/TURN setup). So I just wanted to inform all other Matrix home server admins who face the same challenge: To avoid problems running your Synapse/TURN setup behind a dynamic IP your TURN server needs a setup checking the current IP and restarting the TURN server with the valid IP as parameter every time the IP changes. I'm still testing my current solution and will offer it on my github account later if there are no further problems ( https://gitlab.com/r4dh4l ). If you are interested in testing the setup ask for help in https://matrix.to/#/!TTajZMFjIFlHBynrvu:yuhu.ddns.net - best regards, (formerly) saljut7

======

Ezwen commented 5 years ago

Hi! Sorry to dig this issue but I have a similar problem. I have my own synapse homeserver alongside a coturn server, and the latter works very fine when doing voip between riot-web clients. However, when using riot-android, I always end up with "Media Connection Failed" and coturn's logs don't even show a single connection attempt, as if riot-android was completely ignoring by TURN uris while riot-web does make good use of them. Any suggestion?

jistr commented 5 years ago

@Ezwen in the Android app, do Menu -> Exit to forcefully shut down the Riot client. Then launch the application again, it will do a fresh login and refresh TURN credentials. After this you should be able to make calls until the TURN credentials expire, then you will need to restart the app again. If this works (it does for me) you may want to set the TURN credentials expiration to a long time so that you don't have to do this workaround too often (i use 1 week).

ghost commented 5 years ago

That workaround isn't helping in my case. Consistently failing calls for a couple of months (used to work).

ghost commented 5 years ago

Nevermind, I looked through other tickets related to "could not connect media" and stumbled upon #1787. I installed blokada a couple of months ago, and killing it made calls work again! Please fix this.

Ezwen commented 5 years ago

@impala2 thank you very much! Blokada was indeed the culprit! Mystery solved as last :). I've simply added Riot in the Blokada whitelist and everything works now.

vienfla commented 5 years ago

Hi, sorry to reopen the case but the problem is still here on the newest riot.im version (0.9.3) from playstore. Can't call on different networks.

I need to empty cache or force quit riot.im on both sides before calling.

As soon as both users get their new credentials and matrix shows: GET /_matrix/client/r0/voip/turnServer HTTP/1.1"

I can call, and call again in both directions beautifully for a bit of time.

Then I receive this error from turnserver: ERROR: check_stun_auth: Cannot find credentials of user ... and can't call until emptying cache again...

Homeserver config:

urn_uris: ["turn:turn.mydomain.com:5349?transport=udp","turn:turn.mydomain.com:5349?transport=tcp"]
turn_shared_secret: "pzchHPvmeJKgIF2s2WVrwhGMnSZQ2ivoF3vocLtVOAe1JdDayakk7yQdaMn26FKv"

turn_user_lifetime: "1h"

(I tied different values for turn_user_lifetime)

Coturn config:


listening-port=3478
tls-listening-port=5349
alt-listening-port=3479
alt-tls-listening-port=5350
listening-ip=212.83.171.50
verbose
lt-cred-mech
use-auth-secret
static-auth-secret=pzchHPvmeJKgIF2s2WVrwhGMnSZQ2ivoF3vocLtVOAe1JdDayakk7yQdaMn26FKv
server-name=turn.mydomain.com
realm=turn.mydomain.com
cert=/etc/ssl/private/turn.mydomain.com.tls.crt
pkey=/etc/ssl/private/turn.mydomain.com.tls.key
no-stdout-log
log-file=/var/log/turnserver/turn.log
simple-log
pidfile="/var/run/turnserver.pid"
mobility
no-tlsv1
no-tlsv1_1

Thanks for any help or debugging hints!

plague-doctor commented 5 years ago

and same on 0.9.8 from PlayStore. Come on guys, fix this already...

fearedbliss commented 5 years ago

I'm also receiving the same issue on my homeserver. Running 0.9.8 Riot Android, Synapse 1.4.1, on Debian 10. All of Synapse core stuff works fine (federation, messaging, etc), but when it comes to VoIP it only works for LAN<>LAN and that's cause .. well we are both behind the same NAT so STUN/TURN doesn't need to be used. Looking at the coturn logs, I don't see any voip communication activity in there. It doesn't matter if I restart the app via Exit, or I force quit and clear cache. Originally I was getting an error saying that my server doesn't have a correct turn configuration, after clearing the cache the error went away but when I make the first call after launching the app, it says "Media Connection Failed", and then every other call just immediately says "Call Connecting..." and it rings forever, the other side never receives any notifications/events/etc. Cancelling the call just goes back to the main page (Before it would always say that the server was misconfigured and that I could try to use turn.matrix.org as fallback).

fearedbliss commented 5 years ago

I fixed the error a few days ago, it may have been a combination of outside factors mixed with a typo I had. However, the error: "Exit and Start Riot" problem still exists on Riot Android at least where even if my device has the correct turn info retrieved from synapse, if the target client doesn't have the most up-to-date info, the client wont auto-retrieve info from the server (Since most likely they are still within the token expiration time). This causes the caller to receive an error along the lines of "invalid key or cipher" (403 forbidden on the backend relay). Once they Exit/Start Riot, the correct info was fetched and we can speak.

szimszon commented 4 years ago

Just now I faced the same issue. With version 0.9.9 from Play Store.

check_stun_auth: Cannot find credentials of user

I asked the caller to reboot the phone and without any other change everything is working again... (maybe Riot Android restart could do the trick too)

rajil commented 4 years ago

I am seeing the same issue between two google play installs version 0.9.9 (G-5da259a74). The media connection fails when one device is in Lan and the other on a mobile connection.

aWeinzierl commented 4 years ago

I am seeing the same issue between two google play installs version 0.9.9 (G-5da259a74).

Same (both devices in same LAN)

F-Droid <-> GPlay fails, too.

vacuumbeef commented 4 years ago

I am seeing the same issue between two google play installs version 0.9.9 (G-5da259a74). The media connection fails when one device is in Lan and the other on a mobile connection.

Same thing between GPlay and F-Droid still. I wonder if it will ever be solved, I can't bring some people to matrix because of this problem.

rajil commented 4 years ago

I hope Voip is a first class candidate in RiotX and not an after thought which seems like the case in Riot.im.

For all my users, I have "Allow fallback call assist server" turned on. This workaround has helped me get past this bug.