BelledonneCommunications / linphone-android

Linphone.org mirror for linphone-android (https://gitlab.linphone.org/BC/public/linphone-android)
https://linphone.org
GNU General Public License v3.0
1.13k stars 692 forks source link

connection errors - server cluster #2256

Open corobin opened 2 months ago

corobin commented 2 months ago

G'day,

voip.ms introduced a new clustered environment ca.voip.ms to provide better availability, instead of the traditional single server points of access.

linphone has errors connecting to this new cluster. i have tried other clients on other operating systems and they do not exhibit the same problems.

key configuration info: server ca.voip.ms port 5061 transport TLS encryption SRTP, set to mandatory on both voip.ms and in app

  1. Describe the bug (mandatory)

on linphone 5, calls connect however they are completely static noise, and the lock icon at top right corner shows an "X"

on linphone 6, one of three things happen a) call is pending for extended period of time, just seems to get stuck on the dialing stage, and neither connects nor fails - occasional b) call fails and status shows no encryption - very often c) call successfully connects and status shows encrypted, and sound appears fine - occasional

  1. To Reproduce (mandatory)

connect using the config info above

make a call to any number

  1. Expected behavior (mandatory)

calls should reliably connect, be encrypted, and sound should be clear

  1. Please complete the following information (mandatory)

    • Device: OnePlus Nord N10 5G
    • OS: Android 11
    • also tested with Android 14 on Pixel - same result
    • Version of the App/SDK see screenshot
    • Where you did got it from: Play Store
    • Please tell us if your Android is a Lineage OS or another variant: no
  2. SDK logs (mandatory)

linphone 5 https://www.linphone.org:444//tmp/66f62099ccb20_20161cf2f2a8828c54ab.gz https://www.linphone.org:444//tmp/66f620f4ef788_77d7f872285e1db6a96d.gz

linphone 6 https://www.linphone.org:444//tmp/66f62d09db90d_266745ce4a6da0ea2083.gz

  1. Screenshots (optionnal)

a) linphone 5

call screen showing "X" on lock (top right). call is completely constant static l5callx

version info l5ver

b) linphone 6

call screen showing the "stuck at dialing" scenario. screenshot taken at 5 seconds but it goes on until i press hang up Screenshot_20240926-210332

call screen showing the call fail scenario, the call ends by itself and goes back to main screen Screenshot_20240926-210354

main screen showing a sample of failed vs. successfully connected attempts Screenshot_20240926-210421

version info Screenshot_20240926-210101

Viish commented 1 month ago

Hi @corobin,

I checked your Linphone 6 logs, and here's what happens:

a) call is pending for extended period of time, just seems to get stuck on the dialing stage, and neither connects nor fails - occasional

Indeed for one of your call the server never answers you with a 200 OK after the 100 trying. That's happen when the remote end of the call isn't reachable by the proxy server. There's nothing to be done in Linphone for that.

b) call fails and status shows no encryption - very often

Indeed the server answers you with a 603 DECLINE. This is an issue that we see on a regular basis with voip.ms, but thanks to your logs I'll be able to dig more as you have calls accepted and calls declined in the same log file, that's very interesting.

I'll keep you updated.

Cheers,

corobin commented 1 month ago

some additional info: voip.ms suspect it has to do with TLS (certificate?) handling of the clustered environment

don't know if that's actually the root cause but the suspicion seems to be consistent with observations, notably:

smorlat commented 1 month ago

Hi,

The variability of symptoms is likely to relate to which backend server is going to process the call. I would recommend to disable SRTP encryption, as it is probable that voip.ms does not well support the four encryption algorithms proposed by Linphone, which are: AEAD_AES_128_GCM AES_CM_128_HMAC_SHA1_80 AEAD_AES_256_GCM AES_256_CM_HMAC_SHA1_80

When the call complete sucessfully, AEAD_AES_128_GCM is used by their server. AEAD_AES_128_GCM is rather new (standard was adopted in 2015). Given delays in telecom industry, I would not be surprised that most of their servers do not support it yet.

So please try after setting media encryption to "none", and let us know the results.

Best regards,

Simon

corobin commented 1 month ago

thanks for looking into this!

i changed the transport to TCP in account settings and media encryption to none in advanced settings, and did not observe any problem.

v6 log: https://www.linphone.org:444//tmp/6709e813c2281_ae0f9115bf8ea3567b2e.gz

current version info Screenshot_20241011-200835

i didn't downgrade back to v5 to test

i have not been able to reproduce this problem with any other client

I would not be surprised that most of their servers do not support it yet.

i haven't seen the same problem on linphone with any of their single-server PoP either, only came across this with their new clustered PoP

aGrimRepoMan commented 1 month ago

@smorlat: As a small aside, I notice RFC 4568 section 5.1.1 has a SHOULD requirement

The ordering of multiple "a=crypto" lines is significant: the most preferred crypto line is listed first. Each crypto attribute describes the crypto-suite, key(s), and possibly session parameters offered for the media stream. In general, a "more preferred" crypto-suite SHOULD be cryptographically stronger than a "less preferred" crypto-suite.

Per the trace, Linphone Android V6 offers four crypto suites in the following order:

AEAD_AES_128_GCM
AES_CM_128_HMAC_SHA1_80
AEAD_AES_256_GCM
AES_256_CM_HMAC_SHA1_80

The RFC requirement suggests that offer "should" instead be ordered as:

AEAD_AES_256_GCM
AEAD_AES_128_GCM
AES_256_CM_HMAC_SHA1_80
AES_CM_128_HMAC_SHA1_80

I'm not suggesting this causes the voip.ms cluster interop issue [even if implicated, I suspect it would be tagged against the voip.ms server(s), not Linphone, since there's the well-known guidance: "be strict in what you send but tolerant in what you receive"]. However, I do have two questions, out of curiousity (noting that I have not yet tried Linphone Android v6]:

  1. What was the thought process behind the Linphone v6 crypto suite ordering implementation? Was the concern increased battery drain from the more computationally complex 256-bit crypto algorithms? (I searched a little bit for some benchmarks of the 4 suites, but didn't find anything).
  2. In Linphone Android v6, can RTP crypto suite priority be configured via remote provisioning XML file?

Thanks.

jeannotlapin commented 1 month ago

Hi @aGrimRepoMan , you can modify the set and order of supported srtp crypto suite using the provisioning XML file.

Just set the

[sip]
srtp_crypto_suite=AEAD_AES_256_GCM, AEAD_AES_128_GCM, AES_256_CM_HMAC_SHA1_80, AES_CM_128_HMAC_SHA1_80

or any other relevant CSV list of srtp crypto suite

About the default order, the idea was to be compatible with end points supporting only AES256 based suite but keep using AES128 as the default in order to save CPU in the most common case where AES128 is considered secure enough (to my knowledge there is no indication that AES128 may be weak). This may be less relevant on hardware providing AES functions.

The order might also be:

AEAD_AES_256_GCM
AES_256_CM_HMAC_SHA1_80
AEAD_AES_128_GCM
AES_CM_128_HMAC_SHA1_80

depending on your focus: encryption key size or authentication.