corona-warn-app / cwa-app-android

Native Android app using the Apple/Google exposure notification API. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
2.44k stars 495 forks source link

9002: Timeout waiting for 180000 ms #1187

Closed YtvwlD closed 3 years ago

YtvwlD commented 4 years ago

Avoid duplicates

Describe the bug

Timeout bei Prüfung und daher keine aktuelle Risiko-Ermittlung

Ich habe ein ziemlich altes und etwas langsames Handy (Fairphone 2 mit Android 7) und ich habe die App da letzte Woche installiert und sie hat die ersten paar Tage auch gut funktioniert. Nach fast einer Woche dauert die Risiko-Überprüfung allerdings so lange, dass sie abbricht. Wenn ich die App öffne, sehe ich den "Prüfen"-Bildschirm. Dort steht, dass der Vorgang mehrere Minuten dauern kann.

Nach ca. einer Minute bricht der Vorgang aber ab mit der folgenden Fehlermeldung:

photo_2020-09-20_11-36-17 photo_2020-09-20_11-36-18

Einen Tag später war der Fehler immer noch da (er ist vollständig reproduzierbar bei mir) und die Risikoeinschätzung ist von grün auf unbekannt gesprungen:

photo_2020-09-20_11-36-19

Expected behaviour

Dass die Überprüfung lange dauert, ist wohl leider so. Mein Gerät ist halt langsam. Aber ich würde mir wünschen, dass sie funktioniert.

Steps to reproduce the issue

Ich weiß nicht, ob sich das einfach reproduzieren lässt. Vielleicht:

  1. Die App auf einem alten Handy installieren.
  2. In einen Supermarkt gehen.
  3. Status sollte grün sein.
  4. An zwei Tagen in Folge durch einen vollen Hauptbahnhof laufen.
  5. Prüfung läuft in einen Timeout.

Technical details

Possible Fix

Den Timeout verlängerne – vielleicht auf so zehn Minuten?

Additional context

Das könnte das gleiche Problem sein wie #809, aber das war mit einer älteren Version der App.

Ich benutze Fairphone Open OS mit microG statt Fairphone OS mit den originalen Google Apps.


Internal Tracking ID: EXPOSUREAPP-2863 Internal Tracking ID: EXPOSUREAPP-2893

vaubaehn commented 4 years ago

Hi @YtvwlD , es gibt schon eine Code-Änderung, die den Timeout von 1 Minute auf 3 Minuten verlängert #1138 . Leider kommt die Änderung erst mit der App-Version 1.4.0, es wird also noch eine Weile dauern. Halte uns hier doch bitte auf dem Laufenden, ob sich nach dem Update auf die Version 1.4.0 das Problem bei dir erledigt. Vielen Dank und Grüße

kereng5 commented 4 years ago

Auf den Screenshots ist der Flugmodus aktiviert. Könnte es daran liegen? Ich vermute, dass dann die Kommunikation mit dem Server nicht klappt. Dass auch Bluetooth deaktiviert ist, sollte an dieser Stelle nicht stören.

YtvwlD commented 4 years ago

Nee, das hat damit nichts zu tun. Ich hab auch schon probiert, den aus zu machen. Und WLAN ist ja da.

heinezen commented 4 years ago

Hey @YtvwlD ,

have you had a look at https://github.com/corona-warn-app/cwa-app-android/issues/998? The error message is slightly different, but maybe the issues are related. Other have posted workarounds there that might work for your problem too.

As @vaubaehn already pointed out, the timeout problem is addressed in #1138 and should hopefully be fixed in a future update.


Corona-Warn-App Open Source Team

YtvwlD commented 4 years ago

Hey @heinezen, yes, I've looked at this issue. But I don't think they are related. The other issue is about failing to download the data in time. I don't have a firewall or something like that and I tried it on multiple occasions and both on mobile data and on wifi. Also, it worked in the past.

daimpi commented 4 years ago

@YtvwlD could you share your ENF log? I'd like to see whether the keys for today have been downloaded successfully.

Do you also experience "CAUSE: 3 Something went wrong. ApiException: 39508" like the user here when you open CWA afterwards?

YtvwlD commented 4 years ago

@daimpi How do I get the ENF log? Is that just inside Android's normal logcat?

I can't remember this happening before but after opening the app today I got "URSACHE 3: Etwas ist schiefgelaufen. Fehler bei der Kommunikation mit der Google-Schnittstelle (8)" (after clicking on "details" the stacktrace begins with "com.google.android.gms.common.api.ApiException: 8")

(At this point it might be relevant to remind you that I'm using microG instead of Google's com.google.android.gms.)

daimpi commented 4 years ago

@YtvwlD

(At this point it might be relevant to remind you that I'm using microG instead of Google's com.google.android.gms.)

That's a good point. Usually the ENF log can be found under Settings -> Google -> Covid-19 Notifications -> Click on "checks performed in the last 14 days" -> burger menu in the upper right corner -> export checks.

You can upload it here by either packing the file into a .zip archive or renaming the ending from .json to .txt.

I can't remember this happening before but after opening the app today I got "URSACHE 3: Etwas ist schiefgelaufen.

Interesting, thanks for the info 🙂. You might want to check out my and @vaubaehn's comments here and here for further details. (@heinezen you might also wana check out those comments as they're directly related to the reason why the "9002: Timeout waiting for 60000 ms" error is happening)

dukleinekkmaus commented 4 years ago

After we Figure out that my phone has the same Problem, what can I do? Just wait until CWA 1.4 is available? Shall I try to empty the RAM (daily use is 627 MB of 940 MB)? Could this help to speed up the performance?

daimpi commented 4 years ago

@dukleinekkmaus that's a good question. I'm not entirely sure what can be done from the user side at this point, b/c in your test you already provided basically optimal conditions for CWA but it failed nevertheless. Maybe @vaubaehn & @kbobrowski have an idea?

One thing the CWA-Team could do to spot such problems in the future, is to add low power devices like your Blackview A5 to their Telekom testing center.

kbobrowski commented 4 years ago

Not sure what can be done. @YtvwlD regarding "ApiException: 8" maybe you can ask at https://github.com/microg/android_packages_apps_GmsCore, since you are using alternative ENF implementation this error may have different meaning than in the original Google implementation, maybe microG developer can say something more about it

heinezen commented 4 years ago

Looks like this error is different from #998 then.

@YtvwlD @dukleinekkmaus

I've mirrored your report to an internal Jira ticket (EXPOSUREAPP-2863), so the developers will take care of it now. If new information on the bug is available, it should get posted here.

Regards, CH


Corona-Warn-App Open Source Team

RalfJung commented 4 years ago

Hi @YtvwlD , es gibt schon eine Code-Änderung, die den Timeout von 1 Minute auf 3 Minuten verlängert #1138 . Leider kommt die Änderung erst mit der App-Version 1.4.0, es wird also noch eine Weile dauern. Halte uns hier doch bitte auf dem Laufenden, ob sich nach dem Update auf die Version 1.4.0 das Problem bei dir erledigt. Vielen Dank und Grüße

Gibt es die Möglichkeit, diesen Patch auf 1.3.1 zurückzuportieren, damit er schneller ausgerollt wird? Ich habe bei meinem (ebenfalls langsamen und alten) Telefon dasselbe Problem.

daimpi commented 4 years ago

Gibt es die Möglichkeit, diesen Patch auf 1.3.1 zurückzuportieren, damit er schneller ausgerollt wird? Ich habe bei meinem (ebenfalls langsamen und alten) Telefon dasselbe Problem.

Ich befürchte dafür ist es mittlerweile zu spät… 1.3.1 wurde bereits gestern released: https://github.com/corona-warn-app/cwa-app-android/releases/tag/1.3.1

RalfJung commented 4 years ago

Ah schade, eine Zeitmaschine wollte ich jetzt wirklich nicht verlangen. ;)

svengabr commented 4 years ago

There is a separate Topic on our end regarding the 9002 Timeout Issue. I will bring this up in our developer meeting today again and ask for an update. I will keep you updated.

These are the currently released versions: 1.3.1 Android 1.3.2 iOS

1.4.0 will hopefully be released within the next week. Unfortunately, I can not give exact dates for the release.

svengabr commented 4 years ago

The developers confirmed that the timeout will be increased with release 1.4 and they will continue working on the timeouts in the following release 1.5.

nidico commented 4 years ago

This is the (kind of) corresponding issue in the MicroG repository:

https://github.com/microg/android_packages_apps_GmsCore/issues/1171

nidico commented 4 years ago

Not sure what can be done. @YtvwlD regarding "ApiException: 8" maybe you can ask at https://github.com/microg/android_packages_apps_GmsCore, since you are using alternative ENF implementation this error may have different meaning than in the original Google implementation, maybe microG developer can say something more about it

According to the MicroG developer, this error code is most likely emitted by the client library, thus it shouldn't matter which service is used. 8 means INTERNAL_ERROR.

dukleinekkmaus commented 4 years ago

The Update solve the Problem, but it takes more then 2 minutes to check the risk. I am afraid to run into the timeout again if there is more to compare ...

etron770 commented 4 years ago

Android 6 LG G4C Smartphone. Last update of Corona data was Okt. 13th

The error was with Version 1.3.x

Ursache: 9002 
Etwas ist schief gelaufen
Timed out waiting for 60000 ms

Now with Version 1.5: After the update the app started from the scratch with asking all the questions for a new installation The Error is now:

Ursache: 9002 
Etwas ist schief gelaufen
timeout

Screenshot_2020-10-20-08-25-40

I tried to activate all previously deactivated google apps - no change

svengabr commented 4 years ago

@etron770 The update should have mitigated the issue, I am sorry to hear that the issue still persists for you.

I have collected your feedback and sent it to the developers.

etron770 commented 4 years ago

Additionally: doesn't matter the quality of the Wlan or LTE data connection. Because there is a poor connection rate at the screen print.

etron770 commented 4 years ago

@svengabr As I have a second unused G4C i did
a factory reset update all apps and system installed the corona warn app

same error

feel free to ask for more details, I am using Linux and have the Android debug bridge installed

vaubaehn commented 4 years ago

Hi @etron770 and @svengabr

The error was with Version 1.3.x

Ursache: 9002 
Etwas ist schief gelaufen
Timed out waiting for 60000 ms

Now with Version 1.5: After the update the app started from the scratch with asking all the questions for a new installation The Error is now:

Ursache: 9002 
Etwas ist schief gelaufen
timeout

They are two different errors with different root causes. First error indicates, that exposure checking takes too long. The second error indicates, that there is a problem with the internet connection.

Additionally: doesn't matter the quality of the Wlan or LTE data connection. Because there is a poor connection rate at the screen print.

Related to the stacktrace and having bad wifi AND having bad LTE, I would assume that the phone jumped between wifi and LTE while downloading the key files for exposure checking from the backend server. Nevertheless I'm surprised, because a) the timeout for the transaction to download files should be long enough, and b) even after a factory reset (and likely changed network conditions?) the error is reproducable.

@svengabr : is it possible to cut out the above comments from @etron770 and move it to issue #998 where it belongs to?

daimpi commented 4 years ago

@dukleinekkmaus thanks for the update 🙂.

The Update solve the Problem, but it takes more then 2 minutes to check the risk. I am afraid to run into the timeout again if there is more to compare ...

That's good to know. One thing which was also changed is that the padding multiplier on the server side has also been dropped to 1, i.e. no unnecessary fake keys anymore 🎉. This will help with decreasing the load on the whole infrastructure (from smartphones over networks to servers), so the number of keys will go down quite significantly (by a factor of 5) but otoh we have federation roaming now which might increase the load again.

There will be an even further extension of this timeout to 8 min coming with CWA 1.6 though 🙂.

etron770 commented 4 years ago

@ daimpi and @svengabr

Related to the stacktrace and having bad wifi AND having bad LTE,...I

The wlan and/or LTE qualitiy does not change anything. At the test cellphone (factory reset-updates- installed corona app) is not sim card therefor not LTE availabel, and I tested it nearby the router

Lan:  Fiber to the Home  100Mb/25Mb
Fritzbox 7490
Wlan: 
Band 2,4 GHz
Download rate  ↓72   
upload rate: ↑56  
proberties: n/Wi-Fi 4, 20 MHz, WPA2, 1 x 1, 11k, 11v

The same error

vaubaehn commented 4 years ago

Hi @d4rken , just to fill up some informational gap for me: https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/transaction/RetrieveDiagnosisKeysTransaction.kt#L311-L324

At what point in time success is returned by ENF API? Immediately after calling/handing over keyFiles, configuration and token? Or after ENF finished exposure checking? (I'd expect the first, as ENF broadcasts intents when finished, but I'm not sure about it)

Thanks in advance!

d4rken commented 4 years ago

Immediately after calling/handing over keyFiles, configuration and token?

Yes, shortly after. Handing over the files is quick (few seconds on most devices). The calculation inside the ENF happens asynchronously, so does not count towards the timeout as our transaction ends after handing over the files. When the ENF is done we are notified via broadcast and start a new transaction with its own timeout for the risk calculation.

etron770 commented 4 years ago

Just another hint: it seems that there is only a lack of data. The test cellphone, switched off after installing and testing is still having this error. The cellphone in use is working fine after one day. The test cellphone is still showing the grey information field "Unbekanntes Risiko .... " and after "Prüfung läuft" the error will be schown The cellphone in use ist showing the green information message "Niedriges Risiko .... 2 der letzten 14 Tage aktiv ...."

But ... after reading the error the user is assuming that the app is not working. Maybe he will remove the app. That seems to be real problem during the first period of collecting data.

daimpi commented 4 years ago

@d4rken

Yes, shortly after. Handing over the files is quick (few seconds on most devices). The calculation inside the ENF happens asynchronously, so does not count towards the timeout as our transaction ends after handing over the files. When the ENF is done we are notified via broadcast and start a new transaction with its own timeout for the risk calculation.

Thanks for this explanation. This is quite interesting b/c it kind of speaks against my theory that the process of expansion of DKs and matching them with RPIs is what is responsible for reaching this timeout on lower powered devices.

But if the matching and DK expansion isn't actually something which counts towards the timeout limit, what is causing the timeouts particularly on low powered devices like @dukleinekkmaus's ?

vaubaehn commented 4 years ago

Hi @daimpi , if it's ok, I'll jump in for @d4rken to answer.

Thanks for this explanation. This is quite interesting b/c it kind of speaks against my theory that the process of expansion of DKs and matching them with RPIs is what is responsible for reaching this timeout on lower powered devices.

But if the matching and DK expansion isn't actually something which counts towards the timeout limit, what is causing the timeouts particularly on low powered devices like @dukleinekkmaus's ?

Actually, your theory is not wrong, and it was contributing towards the timeouts, but in an indirect way. Do you remember the kindergardener? The problem before CWA 1.5 was, that CWA was kind of DoS'ing the ENF (how @kbobrowski described it elsewhere) by trying to provide 14 keyfiles of DKs in 14 batches. After ENF received one, it already started its process of DK expansion and matching, which took quite much processor time. With the second call to the API it took more time, until ENF could respond. With every API call the ENF got more unresponsive, ending up in the timeout.

Now CWA 1.5 hands out a list of all DK files in just one single call, and it's up to the ENF to work down on the list (expansion of DKs and matching) 'in the background'. The ENF can now respond quite fast to CWA, that it had taken over the keyfiles list, before the timeout is reached. After ENF returned 'success' to CWA on taking over all files for matching, the matching process is starting. You can easily verify it, when comparing the 'update times' in CWA and ENF history: CWA's timestamp is set when the keyfiles had been handed over to ENF. ENF's timestamp is set when matching finished. On my slow phone, difference is now 2 minutes. Did this clarify up your questions?

(@d4rken , please correct if I'm wrong - and thanks for answering to my question above!)

daimpi commented 4 years ago

@vaubaehn ah yes, that explains it, thank you 😊.

One thing I'm still wondering about: in which cases will a new check be triggered by CWA? B/c if we look at this log, there were 20 checks performed even with CWA 1.5 and even on the log the day later (where the person didn't experience 39508) there were still two checks performed instead of one 🤔.

vaubaehn commented 4 years ago

@daimpi

One thing I'm still wondering about: in which cases will a new check be triggered by CWA?

Afaik, checks are triggered (but not nescesseraily performed) when a) the main process of CWA is created, which happens after installation, update, reboot and opening CWA after force stop, b) a scheduled background job to check is due, which is 24 hours after the last successful check, or a certain time after the last unsuccessful check. A check should only be performed, if there was no successful check for the current day already. So, if you reboot your phone, and you already had a susccessful check that day, CWA won't do another then. But if there was no successful check, the reboot would trigger one (this is why we had also 9002: timed out while waiting 60000 ms in the past with less DKs, because the re-initialisation of Android on reboot consumes quite much processor time). If there is a scheduled job, it could be, that CWA can not perform it, especially when Andorid energy saving restrictions apply. E.g., if CWA is in doze mode, the scheduled job will be delayed until the device is not in doze mode anymore. You will wake up the phone from doze mode, when you plug in the power charge. But at that moment not only CWA's scheduled job is trying to run, but also scheduled jobs of other apps... All are fighting for processor time now at the same moment. This is, why even plugging your phone to the charge could have caused a (silent) time out in past versions of CWA.

But tbh, at the moment I don't have any clue, what had caused that log. Obviously CWA couldn't handover the keyfiles successfully, which is why I thought there might be an API 10 problem - 'no public key found'. Extremly wild guessing: Maybe there was a problem with the (cached) keyfiles, that they not had been stored properly before or got somehow corrupted? :thinking:

vaubaehn commented 4 years ago

@daimpi Ok, I re-checked the logs of that person again, and I think, a timeout is the most plausible cause. And if you look to all past entries, something strange is going on with that phone:

d4rken commented 4 years ago

(@d4rken , please correct if I'm wrong - and thanks for answering to my question above!)

That sounds about right.

I've checked it the ENF submission time on a dev build (theoretically slower than prod) and it's about 600ms on a Moto G3 (LOS 7.1.2) for 15 EUR packages:

2020-11-05 16:07:31.960 32180-32197/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Attempting submission to ENF
2020-11-05 16:07:32.472 32180-32220/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Diagnosis Keys provided (success=true, token=39955808-a8b4-47d0-b9d4-db8ca93f7545)

Not really comparable with the prod environment, but as relative comparison, the ENF processed the test files in ~14 seconds:

2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: PrefixIdMap loaded 19 scan records
2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: matchingNative get 14 files
2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Matching with /data/user/0/com.google.android.gms/app_en_diagnosis_keys/1604589386402/export0.bin
(...)
2020-11-05 16:16:40.171 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Created iterator for /data/user/0/com.google.android.gms/app_en_diagnosis_keys/1604589386402/export13.bin (process_revised_keys = 1)
2020-11-05 16:16:40.208 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Matching done, total 139758 keys, no key matches

And ~550ms for 48 EUR hour packages, so it doesn't really go up with the amount of files we provide:

2020-11-05 16:12:48.596 600-904/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Attempting submission to ENF
2020-11-05 16:12:49.047 600-903/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Diagnosis Keys provided (success=true, token=5c4977e5-5611-4e8b-912e-8d2244fa7bb3)

As the ENF calculation does not happen synchronous and the submission is so quick, the timeout is most likely caused by the download, isn't it? So with the hotfix we increased the overall transaction timeout to 8 minutes, we can't go much higher because we only get a max of 10 minutes background processing time for our worker. So if the 8 minutes don't solve it for some devices, we could add some kind of mechanism that we do fewer downloads in parallel and also abort the downloads early enough such that we can continue with what we have :thinking:.

The 10 minutes should only count though if we are in the background, so maybe for problematic devices there could be a mechanism that removes the timeout when in foreground :thinking:.

etron770 commented 4 years ago

Just another hint: it seems that there is only a lack of data. The test cellphone, switched off after installing and testing is still having this error.

And it is reproducable: The app was working now at the test cellphone, reporting 2 days of collecting data. to reproduce.

  1. remove the app
  2. install the app
  3. start the app the error is there again
YtvwlD commented 4 years ago

With the update to 1.5.0 (and a microG update, also), the app has now been working for a few days for me now. And it also seems to have gotten much faster.

daimpi commented 4 years ago

@vaubaehn just a small question:

You can easily verify it, when comparing the 'update times' in CWA and ENF history: CWA's timestamp is set when the keyfiles had been handed over to ENF. ENF's timestamp is set when matching finished. On my slow phone, difference is now 2 minutes.

Are you sure it's not the other way around?

My CWA status today:

My EN log entry today: {"timestamp":"22. Oktober 2020, 03:02","keyCount":307304,"matchesCount":0,"appName":"Corona-Warn","hash":"oTbVuBbNBBnFzm25gcZH9lTOjOFRPEViypUZH7h9JzA="}

I.e. my CWA timestamp is one minute after my EN timestamp.

vaubaehn commented 4 years ago

@daimpi

Hmm, that's interesting. Either you have been accelerated to more than speed of light while exposure matching was performed, or your phone's clock goes backwards in time for a different reason? Here's mine: Screenshot_2020-10-22-17-50-38 Screenshot_2020-10-22-17-50-23

daimpi commented 4 years ago

@vaubaehn Ah yes that was it! After dropping out of hyperspace time is correct now 😄.

Kidding aside: I just remembered that there was a GH issue on time discrepancy a while back and voila: https://github.com/corona-warn-app/cwa-app-android/issues/948#issuecomment-670088844

Maybe in my case it's a rounding error by ENF in the other direction? As long as the time difference doesn't exceed 1 min I won't be able to tell for sure I guess but it's also not that important tbh ^^

daimpi commented 4 years ago

@d4rken thanks for providing this data.

Just for my understanding, when you say

I've checked it the ENF submission time on a dev build (theoretically slower than prod) and it's about 600ms on a Moto G3 (LOS 7.1.2) for 15 EUR packages:

you're using the CWA 1.5 approach which submits all DKs in one batch, right?

Regarding

As the ENF calculation does not happen synchronous and the submission is so quick, the timeout is most likely caused by the download, isn't it?

Does this mean it would maybe help to also increase the http timeout further, or is this something else? Iirc this timeout was increased from 10s to 20s in 206b181 (#1138) but maybe this could be increased further, would this maybe help?

d4rken commented 3 years ago

you're using the CWA 1.5 approach which submits all DKs in one batch, right?

Yes.

Does this mean it would maybe help to also increase the http timeout further, or is this something else? Iirc this timeout was increased from 10s to 20s in 206b181 (#1138) but maybe this could be increased further, would this maybe help?

The timeout only comes into effect if the download actually times out. If we would run into that I'd say we wouldn't have an issue. If the download is not timing out though, so only very slow, then the task/transaction will just wait for the downloads to finish and if that takes 3 (or 8 minutes), eventually hit the top-level timeout and fail the whole thing.

daimpi commented 3 years ago

@d4rken thanks for the explanation 🙂

So regarding the downloads: I tried to find a really big package and the one timestamped 2020-10-16 with almost 44k DKs was the largest I could find. But even this package only weights 750kB, so if we round this up to 1MB for overhead and multiply by 14, we only have 14MB. Even with a slow 1 Mbit/s connection this would take less than 2min to download and those numbers are already chosen very generously…

Are there really connections that are that much slower that would need more than 3 min?

But otoh we have this conversation between @kbobrowski and me from back in August:

One possibly important thing here is that CWA already operates with very thin timeout margin - without introducing any artificial delay and on the fast wifi network (100 Mbps) the whole process of fetching 14 diagnosis keys and submitting all of them to the API takes about 45 seconds on a slow Android 6 device, which is 75% of available timeout.

Wow… that's much longer than I'd have thought this process takes… I've always assumed that this is done in a few seconds max. I guess the limiting factor in this case is not the bandwidth but the (cryptography-) performance of the SoC, right?

What would speak for this hypothesis is the fact that the Galaxy A6 mentioned above is certainly not the most potent device… On which device did you observe the 45s sync time?

To which @kbobrowski replied:

@daimpi I guess slow device (Motorola G2) plays a role here, but it seems that just downloading keys takes significant portion of this timeout limit (if it has to download all 14 of them).

Similar issue reported in #1030 , if it is the same pattern as in my case (all 14 keys successfully submitted, but then CWA retried the same day), then I think timeout hypothesis would be less likely

So on a 100 Mbps connection downloading took a significant portion of the 60s timeout even though packages were quite a bit smaller back then… so maybe it's not the internet connection speed from the wall which is limiting but something else in the pipeline? I have no clue tbh 🤷.

d4rken commented 3 years ago

So on a 100 Mbps connection downloading took a significant portion of the 60s timeout even though packages were quite a bit smaller back then… so maybe it's not the internet connection speed from the wall which is limiting but something else in the pipeline? I have no clue tbh shrug.

Connection speed can only be as fast as the device can write away the data. Storage I/O is one of the biggest bottlenecks on Android devices, especially on budget devices. Even worse on budget devices that can replace the smaller (but faster) internal storage, with a bigger (but slower) external sdcard. Lot's of manufacturers skimp on good hardware there as it can't be used as effectively as screen specs or "the number of cores" when advertising a device :wink:. It's often not even the raw file to storage speed, but parallel storage access that slows everything down to a crawl.

daimpi commented 3 years ago

@d4rken I see, thanks.

It's often not even the raw file to storage speed, but parallel storage access that slows everything down to a crawl.

But we're talking about less than 14 MB here, can storage really become so slow that this could lead to a timeout? Oo

kbobrowski commented 3 years ago

It's not even only download + write to storage, but also provideDiagnosisKeys call (which is extracting files, verifying signatures and headers and copying them to internal GMS database), not sure how long each of these steps take though

daimpi commented 3 years ago

Do we have some way of benchmarking this, to narrow down the space of possibilities?

kbobrowski commented 3 years ago

It would be possible to benchmark this with https://github.com/kbobrowski/en-i13n

vaubaehn commented 3 years ago

For us could be possible, if devs allow timber to log also in production (even a lot of puppies will die then).