TryQuiet / quiet

A private, p2p alternative to Slack and Discord built on Tor & IPFS
https://www.tryquiet.org
GNU General Public License v3.0
1.98k stars 86 forks source link

App doesn't sync after coming from the background #2396

Closed kingalg closed 6 months ago

kingalg commented 7 months ago

Version: 2.1.2-alpha.14 (367) System: iOS

Issue: The App doesn't sync after coming from the background. The app on iOS never really works in the background, but for some time it was syncing correctly after it was reopened without a need for swiping up or force-quiting. Currently, it doesn't sync.

Steps to recreate: Have a working app on ios, exchange a few messages with another user on a different platform. Put the app on iOS into the background (leave and check a different app or let the iPhone turn the screen off). Do not swipe the app up, force quit it or restart the phone - make sure that it's still in the background. Open it again and try to send/receive a message. It should sync after a short time, but it never does.

Notes: I checked it again on our prod version and it is working there. So, this is a regression that only appeared at some point in one of the 2.1.2 alphas.

holmesworcester commented 7 months ago

We should fix this. It's a blocker!

holmesworcester commented 7 months ago

Since this is a regression, it's worth investing in a test, which also means fixing mobile e2e tests before the release.

At quick glance, it looks like Detox can test sending the app to background and returning from background, so let's do it:

https://wix.github.io/Detox/docs/next/api/device/#devicelaunchappparams

holmesworcester commented 7 months ago

Let's first fix the issue and then see if there is a way to test this issue with our current setup. If there is, let's do it.

If there isn't, let's not.

Ideas for how to test this if we want to:

  1. Try to send a message, if it is not greyed out that should mean that it has connected correctly to the backend.
  2. Build some visual indicator in the app frontend that could show us if we are not able to connect to the Tor control port or know that Tor is working.
  3. Another idea: add a Tor connection indicator similar to the one we show in Desktop alphas in mobile alphas.
  4. Try 2-player Detox tests now (it wasn't possible to in the past because of problems with Tor, but maybe it's not hard to do and so easy that we should just do it.)
  5. If it's useful to use the new "trying to connect..." indicator of connection status then we can merge that work.
holmesworcester commented 7 months ago

Lucas can't reproduce.

It should sync after a short time, but it never does.

How short? Once the app goes into background and comes back, Tor has to restart and then it has to reconnect. This could take some time.

Notes: I checked it again on our prod version and it is working there. So, this is a regression that only appeared at some point in one of the 2.1.2 alphas.

This is the most interesting and confusing part. Why would the last release behave differently? Is it possible that in production, when you switch away and switch back, the app is not getting sent to the background?

Could there be some difference in performance or memory usage, or in the context in which Kinga is testing, that changes how quickly the app goes into the background?

holmesworcester commented 7 months ago

I think overall we should treat this as an iOS+Tor reliability question to resolve in the future, and assume that the app will sync if given enough time and enough tries.

Let's move this out of sprint since we can't reproduce it.

Also, @kingalg and @leblowl are you testing on the same device? Which device?

leblowl commented 7 months ago

I'm testing on an iPhone SE 2nd gen

holmesworcester commented 7 months ago

Notes: Lucas also sees long syncing times after returning to foreground in production, and suspects that we could reproduce the error on desktop as well after restarting.

These are the logs when there is a delay in restarting the connection:

[31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+10s[0m

  [31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+9s[0m
  [31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1ms[0m

  [36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+46s[0m

  [31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+38s[0m

  [31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+4s[0m
  [31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1ms[0m

  [36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+44s[0m

  [31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+40s[0m

  [31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2s[0m
  [31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2ms[0m

  [36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+42s[0m

  [31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+40s[0m

  [31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1s[0m
  [31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2ms[0m

  [35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected to Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [35m+3m[0m
  [35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn is connected to 1 peers [35m+1ms[0m
  [35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn has 1 open connections [35m+0ms[0m
  [31;1mbackend:libp2p:websockets [0moutbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 upgraded [31m+5s[0m

Error upgrading outbound connection is something we see elsewhere when we use the wrong PSK:

https://github.com/TryQuiet/quiet/issues/1897#issuecomment-1777340102

It may mean other things too.

kingalg commented 7 months ago

@holmesworcester Device - I'm also using iPhone SE 2nd gen Usually, it should sync after a few seconds (and this is how it's working on prod for me). So, most often, it will be a few seconds, but on rare occasions when it's longer, it's no more than half a minute. On our current alpha version, I have been waiting for 10-20 minutes without success. For me, on prod it's syncing every time and on alpha, it's not syncing at all. @leblowl could you give me your step by step or record what you are doing? Maybe we do something differently.

leblowl commented 7 months ago

@kingalg I just open the app, connect to a community, then put the app in the background, do something else (open maps or something that requires a lot of resources), wait maybe 20 seconds, then open the app again. I was able to reproduce this on production, where it didn't sync after 15 minutes or more. I think restarting the app fixes things. But I can play it with more, maybe it happens more often on the current alpha then production. Either way I think there is likely a bug here that we can fix.

kingalg commented 7 months ago

@leblowl did you send any message from another app when this one was in the background? And yes, restarting the app fixed it, as the issue is only with the app not syncing after returning from the background (it was working exactly like this in the past, but after Wiktor introduced some changes, the app was syncing correctly). I'm really curious why we are getting such different results on the same version and the same device. That's interesting.

holmesworcester commented 7 months ago

Lucas can reproduce this because it happens frequently enough but probably anyone can.

leblowl commented 7 months ago

I looked into this more, and it looks like since we are now randomizing ports when iOS resumes from background, the HTTP tunnel port that the backend uses to connect to Tor changes. Libp2p has an auto-dialer that re-dials peers, which is how we re-connect to peers when the app resumes. However, in the version of libp2p that we are using, it dials peers sequentially and once the HTTP tunnel port changes, it looks like it just gets hung up waiting for a socket to connect and it never does and we never receive a ECONNREFUSED error, so the dialer just waits. The dialTimeout option is one way to prevent something like this, and we weren't using it. So I hooked that up in #2414, but it still means there is a delay for however long dialTimeout is set to, which is currently 2 minutes.

Ideally, to really fix this, when the app goes to background (or resumes), we can find a way to close any sockets that reference the old HTTP tunnel port.

holmesworcester commented 7 months ago

Let's make a new issue for the more complete fix.

holmesworcester commented 7 months ago

Discussion in Slack: https://zbay.slack.com/archives/G01EHE6JHCM/p1712937541294959

leblowl commented 7 months ago

@kingalg in 2.2.0, after maybe 3 minutes, the app should resume syncing. For a more thorough fix, we have #2424 which will be in the next release ideally.

kingalg commented 6 months ago

iOS 371 A temporary fix is working.