corona-warn-app / cwa-app-android

Native Android app using the Apple/Google exposure notification API. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
2.44k stars 495 forks source link

Very long startup time, stuck on splash screen for 30 seconds #558

Closed kbobrowski closed 4 years ago

kbobrowski commented 4 years ago

Describe the bug

App takes very long to launch (30 seconds), both before and after setting it up: https://www.youtube.com/watch?v=HxXka5IqKjE

Expected behaviour

Should launch in less than 3 seconds

Technical details

Motorola G2, Android 6

Additional context

Other apps launch fast (e.g. Play Store, Datenspende). May affect older phones as I don't have an issue on more recent phone with Android 10

jakobmoellerdev commented 4 years ago

Hi and thanks for your issue. Would you mind telling us a bit more about the circumstances of the seen behavior? Especially we would be interested in your connectivity speed of your WiFi network as this could lead to potentially long requests or even timeouts depending on how fast your connection is. Also, does the issue still persist?

kbobrowski commented 4 years ago

Hi @jakobmoellersap , my wifi is stable 100 Mbps, today startup time is about 3 seconds, if there is some internet traffic to be handled at splash screen then maybe it was just temporary hiccup - but at the same time this issue was occurring I did not have any issues on faster phone with Android 10

I'm also behind custom DNS based on pi.hole just to give you full info about my network, but if domain is not black-listed it'll query 8.8.8.8 and 8.8.4.4

kbobrowski commented 4 years ago

Yeah app is querying app_config endpoint at splash screen, maybe deferring this query after splash screen would solve these problems?

jakobmoellerdev commented 4 years ago

We are currently investigating more deeply into the issue and the current assumption is indeed the app_config endpoint. I will update here once we have more information.

kbobrowski commented 4 years ago

Further comment on the issue: during splash-screen I've noticed GET on app_config at 17:01:39 (status 200), and then same request at 17:01:44 so 5 seconds later with status 304. This was during unusually long splash-screen (around 5-7 seconds), so definitely something going on here

jakobmoellerdev commented 4 years ago

Maybe your custom DNS / Network Setup is at fault Is there any custom proxy configuration that could lead to a 304?

kbobrowski commented 4 years ago

I think it's unlikely DNS configuration is a problem, I'm simply returning 0.0.0.0 for blacklisted domains, and for not blacklisted it queries Google's DNS.

Using my DNS:

$ dig svc90.main.px.t-online.de

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> svc90.main.px.t-online.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23329
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;svc90.main.px.t-online.de. IN  A

;; ANSWER SECTION:
svc90.main.px.t-online.de. 55   IN  A   87.140.208.25
svc90.main.px.t-online.de. 55   IN  A   87.140.208.26
svc90.main.px.t-online.de. 55   IN  A   87.140.208.27

;; Query time: 14 msec
;; SERVER: 192.168.178.25#53(192.168.178.25)
;; WHEN: Thu Jun 18 13:05:22 +06 2020
;; MSG SIZE  rcvd: 102

Using Google's:

$ dig svc90.main.px.t-online.de @8.8.8.8

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> svc90.main.px.t-online.de @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40760
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;svc90.main.px.t-online.de. IN  A

;; ANSWER SECTION:
svc90.main.px.t-online.de. 9    IN  A   87.140.208.26
svc90.main.px.t-online.de. 9    IN  A   87.140.208.25
svc90.main.px.t-online.de. 9    IN  A   87.140.208.27

;; Query time: 9 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Jun 18 13:08:36 +06 2020
;; MSG SIZE  rcvd: 102

As for proxy configuration, this 30 seconds delay was without any proxy, and this 5-7 seconds was with MITM proxy so I can see the communication. Are you not expecting 304 there? I see mostly 304 codes, but thought it's expected since it just tells the client to use whatever is already in cache.

The issue #591 may be related

jakobmoellerdev commented 4 years ago

We are in contact with our server-colleagues as we do not think this issue comes from the Android Side. In case we have response spikes due to the CDN we will probably introduce dedicated resilience behavior, however I think there is not that much we can do from the App side.

kbobrowski commented 4 years ago

I guess for some reason app_config needs to be queried at startup, but maybe if it is just queried right after main activity is displayed it would improve UX as user won't be stuck at splash-screen. But perhaps it's not really an issue as long as there won't be communication problems with CDN

jakobmoellerdev commented 4 years ago

Thanks for your input! Unfortunately, the App Config is loaded for certain App-Behavior before Main Activity Launch, which makes this behaviour unfeasible.

kbobrowski commented 4 years ago

I was able to recreate it on fast Android 10 phone as well in two ways:

jakobmoellerdev commented 4 years ago

Thanks for your input, we will take this into account and see if we can provide a solution for this scenario by checking the network connectivity more in detail

jakobmoellerdev commented 4 years ago

From what we can see, what would be needed is a loading indicator on app startup, which we would categorize as a feature request. The current behaviour is not a bug for us, as the application does require internet to fetch the application configuration from the server.

genodeftest commented 4 years ago

I'm seeing the same issue: App takes > 10 seconds to start if on wifi.

Workaround:

  1. Go offline
  2. open CWA (now takes about 1 second to load)
  3. Go online

@jakobmoellersap: It looks like some network connection happens on startup which is blocking the GUI until it times out. How about moving the network connection to a non-GUI thread? User experience would quite benefit from that and it would be nicer than having a loading indicator.

genodeftest commented 4 years ago

Additional note: I get this error in adb logcat at the same time:

System.out: [java.net.SocketTimeoutException: failed to connect to svc90.main.px.t-online.de/87.140.208.250 (port 443) from /10.1.10.1 (port 49091) after 10000ms, java.net.SocketTimeoutException: failed to connect to svc90.main.px.t-online.de/87.140.208.249 (port 443) from /10.1.10.1 (port 38427) after 10000ms, java.net.SocketTimeoutException: failed to connect to svc90.main.px.t-online.de/87.140.208.251 (port 443) from /10.1.10.1 (port 45018) after 10000ms, java.net.SocketTimeoutException: failed to connect to svc90.main.px.t-online.de/2003:2:20:707:87:140:208:217 (port 443) from /fd00:1:fd00:1:fd00:1:fd00:1 (port 49471) after 10000ms]

So it looks like the network access is tried 3 times (2 retries) with a timeout of 10 seconds each. I can provide a full adb logcat -d output to some e-mail address but I would like to keep it out of the public internet on GitHub.

Steps to reliably reproduce (i.e. simulate bad network quality for testing purposes):

  1. Have NetGuard installed and running. Activate filtering, i.e. deny all non-system app traffic by default. Do not allow CWA. Sidenode: Since system still has network access, Android will show that the network connection is active.
  2. Start CWA. It takes very long to start.

It looks like the code is relying on ConnectivityHelper.isNetworkEnabled() and assumes that if it turns true, the network is actually fast. This may not always be the case, e.g. on 2G networks or bad wifi. Tracing calls to isNetworkEnabled() or profiling the app with the artificially bad network simulation as described above should help find the relevant lines of code causing this blocking.

jakobmoellerdev commented 4 years ago

Thanks for your additional input! We observed the same behaviour and still see this as a feature request as the resilience patterns and timeout values together with a normal network setting should almost never result in a failure. I do agree however this is something that needs to be revisited especially in terms of how we deal with extremely slow network connections for additional resilience. I will bring this up in the team again in order to revisit our prioritisation on this.

Regarding your suggestion of having this in a separate thread: If the Application Configuration is not available there is explicit behaviour defined in our application depending on the cause of this. This is meant to provide latest application configuration data together with security in case the app config was corrupted due to signature mismatch. While the UX might slightly improve by seeing the home screen for some time, the app would not function properly without this call, so moving it out of the splash screen is questionable.

genodeftest commented 4 years ago

Thanks for your quick reply!

How about caching the configuration until a new one could be downloaded? This way the UI would immediately be fully usable and the config would still be up-to-date.

jakobmoellerdev commented 4 years ago

Good Catch. We actually already had this implemented at a point in time before release but decided to opt-out of this and instead rely on the sent cache-control header from the server. A Transport-Layer only Cache is way more reliable and less maintenance intensive. Thus, we do not have active control of the caching of the App Configuration outside of the Transport Layer. I think the App Config uses the same Cache-Control values as key packages from the server at the moment, but I would have to clarify on the details.

tkowark commented 4 years ago

Thank you all for your input! We are now tracking the Splash Screen proposal for both apps centrally in corona-warn-app/cwa-wishlist#92 .