Closed kingalg closed 6 months ago
We should fix this. It's a blocker!
Since this is a regression, it's worth investing in a test, which also means fixing mobile e2e tests before the release.
At quick glance, it looks like Detox can test sending the app to background and returning from background, so let's do it:
https://wix.github.io/Detox/docs/next/api/device/#devicelaunchappparams
Let's first fix the issue and then see if there is a way to test this issue with our current setup. If there is, let's do it.
If there isn't, let's not.
Ideas for how to test this if we want to:
Lucas can't reproduce.
It should sync after a short time, but it never does.
How short? Once the app goes into background and comes back, Tor has to restart and then it has to reconnect. This could take some time.
Notes: I checked it again on our prod version and it is working there. So, this is a regression that only appeared at some point in one of the 2.1.2 alphas.
This is the most interesting and confusing part. Why would the last release behave differently? Is it possible that in production, when you switch away and switch back, the app is not getting sent to the background?
Could there be some difference in performance or memory usage, or in the context in which Kinga is testing, that changes how quickly the app goes into the background?
I think overall we should treat this as an iOS+Tor reliability question to resolve in the future, and assume that the app will sync if given enough time and enough tries.
Let's move this out of sprint since we can't reproduce it.
Also, @kingalg and @leblowl are you testing on the same device? Which device?
I'm testing on an iPhone SE 2nd gen
Notes: Lucas also sees long syncing times after returning to foreground in production, and suspects that we could reproduce the error on desktop as well after restarting.
These are the logs when there is a delay in restarting the connection:
[31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+10s[0m
[31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+9s[0m
[31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1ms[0m
[36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+46s[0m
[31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+38s[0m
[31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+4s[0m
[31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1ms[0m
[36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+44s[0m
[31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+40s[0m
[31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2s[0m
[31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2ms[0m
[36;1mbackend:libp2p:websockets:err [0merror upgrading outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3. Details: stream ended before 24 bytes became available [36m+42s[0m
[31;1mbackend:libp2p:websockets [0mconnect zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion:80 [31m+40s[0m
[31;1mbackend:libp2p:websockets [0m/dns4/dmjdcthp7lo6igbeuatzgeuuyjg25vkamkxosgsp6rhotxt3ezahexad.onion/tcp/80/ws/p2p/QmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+1s[0m
[31;1mbackend:libp2p:websockets [0mnew outbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [31m+2ms[0m
[35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn connected to Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 [35m+3m[0m
[35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn is connected to 1 peers [35m+1ms[0m
[35;1mbackend:Libp2pService [0mQmWkhUw5redKNGLqu7fK4oqSYNXJ9RtHym9tmnjUzaLBnn has 1 open connections [35m+0ms[0m
[31;1mbackend:libp2p:websockets [0moutbound connection /dns4/zedbjrdnxpexhfxxwljjmw6csodt7o5j3d2ebd6upit4aiggdjhec5ad.onion/tcp/80/ws/p2p/Qmc9ogVhL9kL5ZJKPbfjKTMvATVkQCoNfTmLEUxvadfds3 upgraded [31m+5s[0m
Error upgrading outbound connection
is something we see elsewhere when we use the wrong PSK:
https://github.com/TryQuiet/quiet/issues/1897#issuecomment-1777340102
It may mean other things too.
@holmesworcester Device - I'm also using iPhone SE 2nd gen Usually, it should sync after a few seconds (and this is how it's working on prod for me). So, most often, it will be a few seconds, but on rare occasions when it's longer, it's no more than half a minute. On our current alpha version, I have been waiting for 10-20 minutes without success. For me, on prod it's syncing every time and on alpha, it's not syncing at all. @leblowl could you give me your step by step or record what you are doing? Maybe we do something differently.
@kingalg I just open the app, connect to a community, then put the app in the background, do something else (open maps or something that requires a lot of resources), wait maybe 20 seconds, then open the app again. I was able to reproduce this on production, where it didn't sync after 15 minutes or more. I think restarting the app fixes things. But I can play it with more, maybe it happens more often on the current alpha then production. Either way I think there is likely a bug here that we can fix.
@leblowl did you send any message from another app when this one was in the background? And yes, restarting the app fixed it, as the issue is only with the app not syncing after returning from the background (it was working exactly like this in the past, but after Wiktor introduced some changes, the app was syncing correctly). I'm really curious why we are getting such different results on the same version and the same device. That's interesting.
Lucas can reproduce this because it happens frequently enough but probably anyone can.
I looked into this more, and it looks like since we are now randomizing ports when iOS resumes from background, the HTTP tunnel port that the backend uses to connect to Tor changes. Libp2p has an auto-dialer that re-dials peers, which is how we re-connect to peers when the app resumes. However, in the version of libp2p that we are using, it dials peers sequentially and once the HTTP tunnel port changes, it looks like it just gets hung up waiting for a socket to connect and it never does and we never receive a ECONNREFUSED error, so the dialer just waits. The dialTimeout
option is one way to prevent something like this, and we weren't using it. So I hooked that up in #2414, but it still means there is a delay for however long dialTimeout
is set to, which is currently 2 minutes.
Ideally, to really fix this, when the app goes to background (or resumes), we can find a way to close any sockets that reference the old HTTP tunnel port.
Let's make a new issue for the more complete fix.
Discussion in Slack: https://zbay.slack.com/archives/G01EHE6JHCM/p1712937541294959
@kingalg in 2.2.0, after maybe 3 minutes, the app should resume syncing. For a more thorough fix, we have #2424 which will be in the next release ideally.
iOS 371 A temporary fix is working.
Version: 2.1.2-alpha.14 (367) System: iOS
Issue: The App doesn't sync after coming from the background. The app on iOS never really works in the background, but for some time it was syncing correctly after it was reopened without a need for swiping up or force-quiting. Currently, it doesn't sync.
Steps to recreate: Have a working app on ios, exchange a few messages with another user on a different platform. Put the app on iOS into the background (leave and check a different app or let the iPhone turn the screen off). Do not swipe the app up, force quit it or restart the phone - make sure that it's still in the background. Open it again and try to send/receive a message. It should sync after a short time, but it never does.
Notes: I checked it again on our prod version and it is working there. So, this is a regression that only appeared at some point in one of the 2.1.2 alphas.