corona-warn-app / cwa-app-android

Native Android app using the Apple/Google exposure notification API. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
2.44k stars 495 forks source link

Major data usage (2,25GB mobile data in one month) #4753

Closed dadosch closed 2 years ago

dadosch commented 2 years ago

Avoid duplicates

Technical details

Describe the bug

The CWA used 2.25 GB (!) of mobile data in 28 days. Usage in WiFi is unknown.

Steps to reproduce the issue

Expected behaviour

The data usage should not be this high.

Possible Fix

Additional context

gv-3lbtuRo6e98_MNUuP_g_14331af7bd541d962c415788dbb92e202b14be8e KqZf_em7QwisSN2pbc7_ug_ad2d07a0fafb96dea800381817fd31fe2b378d90


Internal Tracking ID: EXPOSUREAPP-11611

vaubaehn commented 2 years ago

Good morning @GisoSchroederSAP and @dsarkar ,

Please provide information until the end of the week, on how you will further proceed on this issue, in terms of

  • fixing/enhancing the caching behavior
  • informing the publisher
  • adapting the FAQ If we won't hear anything from you until Friday evening, it may be the best, that community informs publisher directly next week. as well as public media, to prevent users to get in trouble with their mobile data plans.

I hope you can provide some information during the course of the day, on how are your plans with this issue? If we won't hear anything, we would need to assume that this issue has no priority for the stakeholders or has been considered to be a "won't fix", and that it is tacitly accepted by the stakeholders that a reasonable proportion of Android users has a financial or functional impact on their internet data plans despite publicly published claims. In that case, it we would need to consider further escalation as described above.

Thank you!

thomasaugsten commented 2 years ago

We have no direct influence on the OS behavior of Android we can only report the issue. Traffic of the CWA are free of charge on mobile data plans.

vaubaehn commented 2 years ago

@thomasaugsten Like described above

Traffic of the CWA are free of charge on mobile data plans.

this is not true for everybody. Reason could be, that there are some limits that providers configured. I think providers did not plan for download rates of >13GB per user like in January/February. May I interpret your answer that this issue is a won't fix?

vaubaehn commented 2 years ago

@thomasaugsten Also I described above what you can do to mitigate the issue with impact on OS behavior. Your answer shows that you did not intend to follow this issue further?

darkmattercoder commented 2 years ago

@vaubaehn, I really feel why you are insisting so hard. However, as the cwa already suffers from way too little acceptance and a ton of communication, marketing and development mistakes already have been made, shitty commercial crap apps by some disgustingly acting HipHop artist habe been more accepted than the cwa, I doubt that it would help the project, if the community makes more noise that leads to even less acceptance. Just a thought. How many users would suffer from less than 5gb free memory in 2022? Is that really a thing? I have put shitloads of data on my phone since 2017 and still have about 100 gb left from the original 256gib.....

thomasaugsten commented 2 years ago

The zero rating is valid for all german mobile provider user. Please provide reports where the zero rating is not active we will check this. We are monitoring the issue but at the moment no immediate actions are planned.

darkmattercoder commented 2 years ago

@thomasaugsten there was at least one person reporting that somewhere here in the issue, or has been linked from Twitter or something like that. If you read the discussion you will find it. And: do you really expect that anyone of the real user base will report this who is no techie?

6e6e58c4 commented 2 years ago

@thomasaugsten Like described above

Traffic of the CWA are free of charge on mobile data plans.

this is not true for everybody. Reason could be, that there are some limits that providers configured. I think providers did not plan for download rates of >13GB per user like in January/February. May I interpret your answer that this issue is a won't fix?

The example from the app store doesn't really tell us anything. Zero rating from the provider doesn't affect Android and its telemetry on how much data has been transferred.

vaubaehn commented 2 years ago

@darkmattercoder I understand your thoughts. Usually, escalating things in such a way is not what I like to do, with rare exceptions. Exceptions are issues with impact on security (and we had also such issues, which are still not completely closed...) or when there is a direct impact on users (financially or practically) that they are possibly not aware about, especially when there are different public claims. And: there does not seem to be any will to at least try out (further) measures to mitigate the issue. I'm following here for nearly two years, and I think I contributed quite much. In all the time I experienced a good collaboration between stakeholders and community. But in the case here, I am convinced that a wrong decision has been taken that is to the disadvantage for a proportion of Android users. I had discussions with one of the best community representatives before in regard to public disclosure of issues. We agreed that there is a risk of harming the public acceptance of the app in general. But in the case here, I am convinced that the harm could be much bigger when at a later point in time it becomes public, that stakeholders tacitly accepted financial impact on users because there are not enough resources to solve an issue with higher priority. So for a public disclosure I tried to be as differentiated as possible. As you correctly wrote, there are just "a few" users affected, and we don't know how many in general. Around one year ago I found a source of marketing research, that estimated Android 6 phones still with ~10% on the market (meaning: old phones with low storage). Meanwhile, there will be less, likely. But still there may be many more with less than 5GB free space... Let's see.

darkmattercoder commented 2 years ago

@thomasaugsten So you know, that no hard limits for the zero rating are hard coded in any of the German carrier infrastructure? Is that a statement that you got from them and which is reliable? I think that I formation would even be easier to get from the carrier people than by debugging, wouldn't it?

dadosch commented 2 years ago

The zero rating is valid for all german mobile provider user. Please provide reports where the zero rating is not active we will check this. We are monitoring the issue but at the moment no immediate actions are planned.

I did this earlier:

The data usage definitely counted towards the data cap, as the cap was reached and the person had to buy a additional data plan.

I want to add that the cap was definitely reached because of the CWA. The person in question normally used only very little cellular data, thats why I can tell.

I also want to point out that while there probably not that many people affected by this, this does cost some people real money (!) and some who are not technicall savvy, don't know about it or that it is caused by the CWA. There should be at least some press info/addition the FAQ etc!

vaubaehn commented 2 years ago

@6e6e58c4

Zero rating from the provider doesn't affect Android and its telemetry on how much data has been transferred.

That is correct, but the example from app store tells us, that his connection rate has been decreased after hitting a limit. @dsarkar I would suggest to prepare the store support teams to try to get in contact with the users and ask for the provider.

Also we still have our opening post here: @dadosch reported:

@MikeMcC399 @Ein-Tim very interesting. The data usage definitely counted towards the data cap, as the cap was reached and the person had to buy a additional data plan.

and

@MikeMcC399 the provider is AldiTalk, so E-Plus/Telefonica.

For me it is not acceptable to not actively trying to do more to mitigate this issue.

vaubaehn commented 2 years ago

@thomasaugsten @GisoSchroederSAP I would be happy if you reevaluated this issue again until tonight.

6e6e58c4 commented 2 years ago

@vaubaehn

...but the example from app store tells us, that his connection rate has been decreased after hitting a limit

Yes, that's how it works O.o But we don't know the actual reason. That one user claims that it was caused by the CWA. Zero rating works otherwise the outcry would be way bigger than just one or two (questionable) reports in the app store.

You can't just spend a huge amount of money just because one user reports something. That's not how software development works - at least not with such a diverse userbase and complex hardware and software options.

thomasaugsten commented 2 years ago

@dadosch Can you please provide me the provider and the carrier and what kind of contract normale contract or prepaid via mail When exactly in January and February the data cap notification was happens and the usage via android settings thomas.augsten@sap.com

vaubaehn commented 2 years ago

@6e6e58c4

@vaubaehn

...but the example from app store tells us, that his connection rate has been decreased after hitting a limit

Yes, that's how it works O.o But we don't know the actual reason. That one user claims that it was caused by the CWA.

Yes, that's true. There still can be other data usage the user was not aware about. This is why I asked @dsarkar to have a better follow up with users in this issue in the app stores.

Zero rating works otherwise the outcry would be way bigger than just one or two (questionable) reports in the app store.

Yes, in general zero rating works. After the issue here was opened, I also found my CWA to have used >300MB mobile quota, but vodafone did not account for it in my case (validating via Android OS data meter vs. charged quota from vodafone). My observation is, that reports in the app stores are just the "small drop on the hot stone", when you compare against the high numbers of users. Only a very small percentage of people actively report problems or issues via app stores, even less via GitHub. This is a claim I can't prove, but it may have enough face validity.

There is at least the possibility that providers did not configure a "full flatrate" for IPs used by CWA, but set an unknown monthly limit per device, to protect themselves for unexpected costs, in case something goes wrong... And now something went wrong. So it's shared cost with the user. But this is speculation, currently.

You can't just spend a huge amount of money just because one user reports something. That's not how software development works - at least not with such a diverse userbase and complex hardware and software options.

You are really true, and I'm agreeing. But the case here is situated differently: We analized what happend, how it happens, how it can be reproduced, what impact it has - and: how it can be mitigated. We know that there is a problem, and we can be sure it can be solved. My estimations of necessary resources to practically analize and solve the problem: For reproduction/analizing:

For fixing/testing

TSI - changing provision of hourly Diagnosis Keys

I think that's somehow not that much money/resources/effort.

vaubaehn commented 2 years ago

@6e6e58c4 What I forgot: The above proposed solution would mitigate the issue here. But it would also have even more positive side affects for all Android users: Interaction between CWA and ENS would speed up significantly, thus with high incidence rates we have less timeouts in CWA and ENS, thus risk matching works much more reliably. That could affect other open issues positively (i.e., "stopped risk calculation" where timeouts are the root cause).

thomasaugsten commented 2 years ago

Whitelisting happens on the carrier level of the 3 carriers in Germany not on provider level. They all whitelisted the IP and there is no limitation applied. We need more details on the his exact case. What is the data plan limit what means low mobile data usage. Because I see 1.3 GB on firefox in the screenshot.

dadosch commented 2 years ago

Hi @thomasaugsten

just send you an email with the requested details. The data cap limit is 3 GB. The person has never reached the data cap in the previous months and their usage is quite constant, that's why I noticed the bug in the first place at all.

dsarkar commented 2 years ago

@vaubaehn Thanks for the hint, will forward it to app stores, mailbox and hotline, that they should ask for the provider.

dsarkar commented 2 years ago

Hi @dadosch, @vaubaehn and community, thanks again everybody for reporting and the detailed analysis. I got feedback, that this issue is being investigated, and someone from the dev team or from Corona-Warn-App Open Source Team will come back here to report the findings once we know more.

Best wishes, and a nice weekend, DS


Corona-Warn-App Open Source Team

vaubaehn commented 2 years ago

Hi @dsarkar , this is good news! Thank you. To make it easier for the investigator, here are some links:

When your message came into my mailbox, I was about to write down a possible text for online editorial offices that could be interested into the subject. My intention was to share it here for discussion before submission. We can keep it into the waiting queue for now. However, for transparency, it could look like this:

Corona-Warn-App: Extrem hoher Datenverbrauch bei Android Geräten mit wenig freiem internen Speicherplatz

Einzelne Nutzer haben den Entwicklern der Corona-Warn-App auf GitHub gemeldet, dass die Corona-Warn-App innerhalb eines Monats einen ungewöhnlich hohen Datenverbrauch durch Downloads aufweist. So wurden in einem Fall Downloads von 2.25GB über eine mobile Datenverbindung, und in einem anderen Fall mehr als 13GB über eine WLAN-Verbindung berichtet. Dabei gab es auch Einzelberichte, dass durch die hohe Downloadrate über mobile Daten das vertragliche Datenvolumen der Nutzer überschritten wurde, so dass in einem Fall für eine Aufstockung des Datenvolumens bezahlt werden musste, und in einem anderen Fall die verfügbare Datenrate gedrosselt wurde. Die Entwickler weisen in ihren FAQ (https://www.coronawarn.app/de/faq/?search=zero#mobile_data_costs) darauf hin, dass alle Datenverbindungen der Corona-Warn-App von den Mobilfunk-Netzbetreibern nicht auf den Datentarif angerechnet werden. So ist derzeit unklar, warum betroffene Nutzer entsprechende Einschränkungen erfuhren. SAP bemüht sich hier um Klärung. In dem betreffenden Issue auf GitHub (https://github.com/corona-warn-app/cwa-app-android/issues/4753) wurde der Grund für den hohen Datenverbrauch von betroffenen Nutzern analysiert: Die Corona-Warn-App lädt sogenannte Diagnoseschlüssel von erkrankten Nutzern, die andere Nutzer warnen möchten, von einem Server für die Erkennung von Risikokontakten herunter. Diese Diagnoseschlüssel werden in einem Zwischenspeicher der App abgelegt. Hat das betroffene Gerät aber weniger als 500MB internen Speicher frei, löscht das Android-Betriebssystem automatisch alle Zwischenspeicher, um freien Speicherplatz zu schaffen. Dies führt dazu, dass die Corona-Warn-App einmal (über mobile Datenverbindung) oder mehrmals täglich (über WLAN) immer wieder alle Diagnoseschlüssel der vergangenen 10 Tage statt eines kleineren Pakets für die letzten Stunden herunter lädt. Die Größe der Datenpakete ist abhängig von der Anzahl der erkrankten Nutzer in Deutschland und den angebundenen europäischen Ländern. So wurden auf dem Höhepunkt der Omikronwelle im Februar von einem betroffenen Gerät bis zu 600MB Daten täglich über WLAN heruntergeladen. Mitglieder der Corona-Warn-App-Community stellen auf GitHub eine Anleitung zur Reproduktion des Problems zur Verfügung und schlagen auch mögliche Verbesserungen der App vor, um das Downloadvolumen massiv zu senken. Nach Angaben eines SAP-Mitarbeiters wird das Problem weiterhin beobachtet, es gibt aber aktuell nicht die Absicht, die App anzupassen. Zwar sinken augenblicklich die Infektionsraten in Deutschland stark, so dass sich alleine dadurch das Downloadvolumen in der kommenden Zeit deutlich reduzieren dürfte. Allerdings weisen Forscher der TU Berlin auf der Grundlage ihrer Simulationsstudien (https://depositonce.tu-berlin.de/bitstream/11303/16461/4/2022-02-23_MODUS-COVID_Bericht.pdf) darauf hin, dass in den kommenden Wochen durch eine stärkere Verbreitung der Variante Omikron BA.2 mit einem Wiederanstieg der Infektionszahlen zu rechnen ist. Die Infektionszahlen könnten ein ähnliches Niveau wie bei der letzen Welle erreichen, oder im schlimmesten Fall die Infektionszahlen um den Faktor 2.5 übertreffen. Dies würde erneut ein entsprechend hohes Downloadvolumen für die Corona-Warn-App bedeuten, wenn aufgrund zu geringen Speicherplatzes die zwischengespeicherten Diagnoseschlüssel automatisch vom Android-Betriebssystem gelöscht werden. Auch wenn sich die Mobilfunk-Netzbetreiber verpflichtet haben, das Datenvolumen nicht zu berechnen, empfehlen wir vorsichtshalber darauf zu achten, dass auf Ihrem Android-Gerät bei der Nutzung der Corona-Warn-App ständig deutlich mehr als 500MB interner Speicherplatz zur Verfügung stehen, um keine unliebsamen Überraschungen zu erleben.

So let's see how things are developing in the course of next week.

Dear @dsarkar , thanks a lot for your engagement, too! Have a nice week-end!

mlenkeit commented 2 years ago

@vaubaehn thank you for your thorough analysis, this is indeed very helpful! As @dsarkar mentioned, the team is looking into this and we'll discuss what options we can realize short team and which ones would require more time to realize. The suggestions that were made here in this thread will definitely be considered.

We have also reached out to Google but are yet waiting for reply, however, any fix on ENF side would probably take some time.

mlenkeit commented 2 years ago

The current idea to solve this on CWA side short term is to store the key files in data instead of the cache. That way, even if Android automatically cleans the cache due to low storage on the device, the key files would not be affected and no re-download would occur.

Any feedback is highly appreciated.

vaubaehn commented 2 years ago

Hi @mlenkeit , thanks for sharing your considered measures.

Storing the Diagnosis Keys packages inside the data space could be a temporary emergency solution. As it looks like, we might soon reach a plateau wrt to incidence rates, meaning it could be good to have a fast action on changing the storing regime. Do you see any chance to get that into 2.19? That would probably help a lot.

However, for me it's not clear how you assessed possible downsides: In the case of corrupt data of Diagnosis Keys files (hardware or corrupt file structure) there is a low but existing risk of an unrecoverable crash when calling provideDiagnosisKeys(). In that case users would need to re-install CWA or clear all app data to recover CWA in a functional state, meaning loss of user TEKs/contact RPIs (due to Google's data privacy policy and automatic clearing processes), loss of stored certificates, diary information, check-ins,... That is the reason why @d4rken was careful with Error Log and decided to place it into the cache. I would be happy, if @d4rken or @mtwalli could comment on that risks.

So, for a long term solution I still favor to log downloaded and provided DK packages, to only provide packages once to ENS and to not store downloaded DK packages after provision anymore. Again, that would also speed up the process of provideDiagnosisKeys() and most likely solve all other problems that are connected with CWA/ENS timeouts. The log db would also need to be stored inside the data directory. But the difference compared to the Diagnosis Keys packages is, for the db you have full control, for DK packages we would need to rely on TSI mostly.

@d4rken @mtwalli could you please also share a short feedback?

vaubaehn commented 2 years ago

So, given that you're now going to filter ExposureWindows by age - what only makes sense when ENS is storing and providing EWs with risk contacts up to 14 days and longer than the current period of 10-days-TEKs that are provided to ENS - https://github.com/corona-warn-app/cwa-app-android/blob/df3ae837202874cdb79b16af4f3e69267923b57d/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/risk/RiskLevelTask.kt#L158-#L166

That would mean that

Description of provideDiagnosisKeys():

[...] When using the ExposureWindow mode, all provided keys are accumulated under the ExposureWindows. [...]

https://developers.google.com/android/exposure-notifications/exposure-notifications-api#methods

and

Description of getDialySummaries() (ExposureWindow mode):

[...] This function returns a list of DailySummary objects for the last 14 days (or less, if specified in the DailySummariesConfig). [..]

https://developers.google.com/android/exposure-notifications/exposure-notifications-api#methods

If taken these two descriptions together, we may assume that all exposures are stored for 14 days in ENS and can be retrieved by CWA for that time period.

is true, and provision of each DK package only once is possible?

mlenkeit commented 2 years ago

[...] what only makes sense when ENS is storing and providing EWs with risk contacts up to 14 days and longer than the current period of 10-days-TEKs that are provided to ENS [...]

@vaubaehn the conclusion is not quite correct. Strictly speaking, we are not providing a "period of 10-days-TEKs" to ENF but key packages of TEKs that were submitted in the last 10 days. A key package for day t can contain TEKs for any of the days between t-0 and t-14.

Example:

A new requirement for CWA is to show exposures for the timeframe of the past 10 days only. As CWA still submits keys outside of this period (as illustrated above) and as ENF always returns exposures for the past 14 days, the app-level filtering is required regardless of any caching in ENF.

Changing this so that CWA would not submit keys outside of the 10-day period would require modifying already published key packages every day to remove the newly outdated keys from those packages. This would be a major architectural change across client and server, as published packages wouldn't be immutable anymore. As this would also require apps to regularly re-download these modified packages, this is something that we are currently not pursuing 😉

See also the related discussion in:

Regarding provisioning each DK package only once:

For all we know, ENF (on Android) maintains an internal cache but ENF the documentation seems to be incomplete regarding how this cache works exactly and how durable it is (as opposed to the iOS documentation where this is pretty clear). Additionally, the ENF cache can be reset on Android devices (as opposed to iOS) by the user through the ENF settings. Most users probably won't do that but it would be an event that CWA wouldn't be able to notice and could cause users to miss exposures.

These are some of the reasons why we are hesitant to make this change.

In the case of corrupt data of Diagnosis Keys files (hardware or corrupt file structure) there is a low but existing risk of an unrecoverable crash when calling provideDiagnosisKeys().

This is pretty interesting and indeed something we did not consider, thanks for bringing this up again 👍 (I now saw that you made the same point earlier in this thread already).

Interestingly, the ENF documentation seems to encourage persisting the files rather than keeping them in the cache:

Note: When downloading the files from the internet-accessible server, store them inside the app-specific directory on internal storage. Optionally, you can create a subdirectory under the [filesDir](https://developer.android.com/reference/android/content/Context#getFilesDir()) and delete the subdirectory after you've finish processing.

Source: https://developers.google.com/android/exposure-notifications/exposure-notifications-api#providediagnosiskeys

Anyway, we will revisit the aspect of dealing with file corruption and @mtwalli is of course involved in the discussion 😉

vaubaehn commented 2 years ago

Hi @mlenkeit , thanks a lot for the detailed response. That facilitates discussion by far, imho.

[...] what only makes sense when ENS is storing and providing EWs with risk contacts up to 14 days and longer than the current period of 10-days-TEKs that are provided to ENS [...]

@vaubaehn the conclusion is not quite correct. Strictly speaking, we are not providing a "period of 10-days-TEKs" to ENF but key packages of TEKs that were submitted in the last 10 days. A key package for day t can contain TEKs for any of the days between t-0 and t-14. [...]

True. I had been aware of it actually, as you can see in your linked discussion: https://github.com/corona-warn-app/cwa-documentation/issues/848#issuecomment-1032537557, but I lost it somehow in the moment when writing above comment. I guess it's not easy for everyone involved to recall the hierachical encapsulated conditions at all the time 😉 .

And yes, that limits my drawn conclusion, unfortunately.

A new requirement for CWA is to show exposures for the timeframe of the past 10 days only. As CWA still submits keys outside of this period (as illustrated above) and as ENF always returns exposures for the past 14 days, the app-level filtering is required regardless of any caching in ENF.

Independently from everything else already stated in this issue, it's good to know that app-level filtering of ExposureWindows by contact date is possible at all, as the is one important prerequisit to implement a different Diagnosis Keys privisioning as proposed here before. For me it's good to know, as I only could speculate on this before (don't neither have sufficient time nor necessary "material" (i.e., Android IDE) to directly investigate).

Changing this so that CWA would not submit keys outside of the 10-day period would require modifying already published key packages every day to remove the newly outdated keys from those packages. This would be a major architectural change across client and server, as published packages wouldn't be immutable anymore. As this would also require apps to regularly re-download these modified packages, this is something that we are currently not pursuing 😉

I didn't think of something like this at all, and I'm fully agreeing. That would be a horrible scenario, if I may speak so.

Now to the ENF:

Regarding provisioning each DK package only once:

For all we know, ENF (on Android) maintains an internal cache but ENF the documentation seems to be incomplete regarding how this cache works exactly and how durable it is (as opposed to the iOS documentation where this is pretty clear).

Could you be so kind and describe in some words, how iOS is handling it? It's not only to stop my curiosity about that issue, but there's a high likelihood that Google and Apple aligned their handling here for full compatibility/feature parity. But as you refered to the documentation, it's only an indicator that Google's ENF works the same, and not a proof unless Google responds to you with more information or enhances their public documentation.

Additionally, the ENF cache can be reset on Android devices (as opposed to iOS) by the user through the ENF settings.

Ha. No feature parity.

Most users probably won't do that but it would be an event that CWA wouldn't be able to notice and could cause users to miss exposures.

This actually does not have any impact on the proposed solution to provide any Diagnosis Keys package only once for the following reason: Resetting the ENF cache means resetting all ENS data. There is no way to choose between "just the RPIs I collected" or "all the TEKs that have been created for me". This happens on reinstall of CWA, uninstalling of CWA, simply clearing CWA's data directory (that point is really bad implementend by Google, imho), clearing data of Google Plays Services in all, or when user selects "delete data" via the ENF UI. So, in this case, there are neither any data for Exposure Windows stored inside ENF anymore, nor any collected RPIs that could be matched with downloaded TEKs that you provide with the Diagnosis Keys packages. You will just start collecting RPIs again, and only the latest DK packages have a chance to contain TEKs that could match with a risk contact. You see, providing any DK package only once is sufficient, because either older EWs are cached in ENS, or you would start right from the beginning anyway.

In the case of corrupt data of Diagnosis Keys files (hardware or corrupt file structure) there is a low but existing risk of an unrecoverable crash when calling provideDiagnosisKeys().

This is pretty interesting and indeed something we did not consider, thanks for bringing this up again 👍 (I now saw that you made the same point earlier in this thread already).

On this particular point, I'd like to add, that the risk of a crash should exactly be the same, no matter whether data is stored in cacheor data. I know that @d4rken already assessed this risk at least for himself when implementing Error Log, that's why I'm still encouraging him to comment here about his thoughts 😉 I hope, you're ok with my repeated calls, dear Matthias ❤️ . The difference is, that app is easily recoverable when corrupt data is persisted in cachefrom a technical point of view. However, from an UX perspective, it's still difficult to support/assist the biggest proportion of the users to clear their caches in case something goes wrong, so probably it does not make such a difference whether corrupted data are in cacheor data...

Interestingly, the ENF documentation seems to encourage persisting the files rather than keeping them in the cache:

Note: When downloading the files from the internet-accessible server, store them inside the app-specific directory on internal storage. Optionally, you can create a subdirectory under the [filesDir](https://developer.android.com/reference/android/content/Context#getFilesDir()) and delete the subdirectory after you've finish processing.

Source: https://developers.google.com/android/exposure-notifications/exposure-notifications-api#providediagnosiskeys

Yes, that's interesting, Max. But from my understanding that's even more interesting:

Optionally, you can create a subdirectory under the [filesDir](https://developer.android.com/reference/android/content/Context#getFilesDir()) and delete the subdirectory after you've finish processing.

It's not an evidence that it's enough to provide any DK package only once, but an indicator that it might be sufficient, when Google suggests to clear already provided data after.

Anyway, we will revisit the aspect of dealing with file corruption and @mtwalli is of course involved in the discussion 😉

I didn't have any doubt 😉

So in result, I would encourage you to store DK packages in datauntil a better solution can be implemented, and take action as soon as possible (2.19?)- unless @mtwalli or @d4rken have good argumentation not to do so.

d4rken commented 2 years ago

I didn't read every post in this thread (sorry :sweat_smile:), but regarding data vs cache:

The decision to place DK pkgs in cache/ was based on offering edge case resilience and providing user & system flexibility.

The DK packages can be redownloaded at will, without harm, the only cost being network traffic (which is supposed to be null-rated). Putting them in cache/ give the user (and system) the option to recover space if needed without harming the CWA's functionality. It also gives a data loss free option to wipe all packages without loosing any actual data (though this edge-case has luckily not been necessary yet).

What may have been underestimated is the aggressiveness of the cache purges.

So in result, I would encourage you to store DK packages in data until a better solution can be implemented

This would prevent redownload, and reduce (null-rated?) traffic, but the phone will still fill up. What then?

In the best case (newer Android versions) the system will tell the app there is not enough space before actually running out of space, in the worst case the system is honest and all free space is consumed. At that point the phone and many apps will malfunction. What now? Can't clear the CWA data without actually loosing data. To be able to make the phone usable, the user needs space, will they delete their latest TikToks? Or maybe those are more important they will just wipe the CWA :shrug:.

Having no free space will also move the user into a dead-lock. Let's say CWA stores all DK pkgs in data/, now the packages grow unexpectedly large, something needs to be done about issue XY, an update is needed, but the user can't update the CWA because Google Play doesn't allow it, there is not enough free space.

TL;DR I think moving DK packages from cache to data has more downsides than upsides and the amount of affected users don't warrant this change. It does not give us more free space, just less options, when space runs out.

Short term:

Edit:

Long term options:

Interestingly, the ENF documentation seems to encourage persisting the files rather than keeping them in the cache:

Note: When downloading the files from the internet-accessible server, store them inside the app-specific directory on internal storage. Optionally, you can create a subdirectory under the [filesDir](https://developer.android.com/reference/android/content/Context#getFilesDir()) and delete the subdirectory after you've finish processing.

This is less about cache vs files, but more about storing it in private internal storage for reliability reasons. To discourage devs from storing on external public storage, e.g. removable sdcards, which can it's own caveats.

vaubaehn commented 2 years ago

@d4rken Dear Matthias, thanks for your valuable input!

Just as a short info (which you may kindly ignore when you're busy), as you couldn't read all comments - the situation is quite complex. In key words:

So, my proposal was: Log downloads via eTag, download and provide each package only once. That keeps storage in ENS as low as possible, doesn't affect CWA's app cache (or other storage) in any way, reduces the downloads from TSI servers to a minumum. And, as a side effect: this also likely reduces timesouts in CWA and ENS due to the reduced number of DK packages that need to be handled.

Storing DK packages in data instead of cache was Android dev team's proposal for a short term solution.


@mlenkeit There are some small details in Matthias' reply that are different from my understanding, but I consider his thoughts as a whole as valid arguments against shifting DK packages into data. This would mean, for short term we would end up with

Short term:

  • The user could be informed that this happening, e.g. write file into cache, check if deleted, count up, at y=x, show the user a message with a link to an FAQ or something.

Edit:

  • If the null-rating and using all data is the biggest issue, the above edge-case detection mechanism could be used to further restrict downloads to WiFi only.

With regard to user notifications/FAQ, some suggestions have already been made in this comment: https://github.com/corona-warn-app/cwa-app-android/issues/4753#issuecomment-1048684723

I think that could also be a way to go, although it takes more effort to implement than just changing the storing path.

Long term options:

  • Evaluate whether all DK pkgs need to be kept.

  • Evaluate storage on external devices (sdcards)

Here it looks like Matthias understood, that the size of DK packages stored by CWA matters most, while the size of DK packages stored by ENS is actually a bigger problem. And as the storing in ENS is directly associated with the way how CWA provides DKs to ENS (everytime all), I'm still convinced, the long-term solution is to change that processing like described many times above (every package only once).

mtwalli commented 2 years ago

I try to keep it short:

I think:

I agree with @d4rken that moving to filesDir won't fix the root cause and just save the downloaded keys from being deleted by Android OS , however I wouldn't exclude this option entirely now.

Both directories are not meant to be used to store large file and it is highly recommended to check the available free space before storing to them which should be done by DeviceStorage.

Few things I found while looking into this

Any solution that requires Keys evaluation, needs a Google feedback - which it does not seem to be in the near future.

Anyway we started looking into this issue, we will keep you updated.

vaubaehn commented 2 years ago

Hi @mtwalli , thanks for your detailed feedback! I think we still have different understandings, what has most impact on this issue:

So, the biggest impact had and the best solution would be to change the way how CWA provides Diagnosis Keys files, like described in many comments above. I understand that the biggest problem is, that it is not clear whether changing the provision of Diagnosis Keys to ENS still results in the same result in the age/amount of ExposureWindows with risk contacts submitted from ENS to CWA? Yes, Google could clarify on this, but I believe it is not necessary to wait for a response: All others functions (GetDailySummary() and provideDiagnosisKeys() clearly state that exposures/risk contacts are accumulated for 14 days according to the documentation. In worst case, you could do an experiment: You would need to craft a TEK/RPI manually and inject the RPI as risk contact to ENS (via bluetooth or other method), and then join the TEK to a DK package that is provided to ENS and matched with RPIs. That sounds hard, but @kbobrowski (who also solved a big problem with Google that was preventing ENS to work on Android 6 phones in June/July 2020) was doing this. Maybe he can assist here? In this way it could be evaluated whether providing every single DK package only once is sufficient for valid results in ExposureWindows for the past 10/14 days. Solving this would reduce the amount of storage needed on the volume drastically, hence reducing the risk that Diagnosis Keys are deleted (automatically by Android OS), hence reducing repeated downloads.

Also, do I remember right, that you're already persisting ExposureWindows in a CWA-internal db? Then the above problem would also be solved: even ENS would submit only the risk contacts of the last provided batch of Diagnosis Keys, you can persist them in a db and evaluate them for risk calculation until they expire (currently: 10 days). So, to be on the save side, persisting EWs in CWA could also help, and providing DK packages only once is clearly possible. So, in the end, everything could be solved without Google's support.

Queyring the free space on the volume to assess whether there is enough space to save Diagnosis Keys files for CWA is unreliable, depending on the threshold: If you set 700MB as a threshold, that wouldn't be enough, because that is the state of free space (on my phone) directly after Android cleared all caches. Even after downloading and storing DK files there is still more than 500MB free, just some user activity with other apps is enough to fill up that caches again (e.g., internet browsing with chrome, opening Google Maps, receiving a video in WhatsApp), and to run out of enough free space soon after again. Also, due to ENS storing regime of DK files, the DK files size affects free volume space exponentially. On a "fresh" system, you would lose >2GB within 3 days. Or more, when DK file sizes increases even more in the next weeks. So, the threshold needed to let CWA/ENS work reliably without repeated downloads is difficult to determine and associated to incidence rates. To be on the safe side, I'd propose at least 4GB to 5GB currently. And that is that much that it would need to be communicated to the user carefully.

Anyway, user should be notified about the storage problems that can lead to repeated downloads.

I agree with @d4rken that moving to filesDir won't fix the root cause and just save the downloaded keys from being deleted by Android OS , however I wouldn't exclude this option entirely now.

The worst downside Matthias was mentioning is, if anything goes wrong and an CWA update was needed after a non recoverable problem, Android/Google Play Store would prevent you from an app update when there is less than 500/600MB free, unless you free enough storage manually. Otherwise the user is lost.

  • We always assume that the key size is ~512KB, which will lead to wrong allocations ,if the actual size is way bigger than that (at least that what In understood form the analysis above).

The size of the packages was already 120MB for 10 days and can increase even more in the future. But relying of the storage to allocate with less than 4-5GB is unreliable according to the problems that I tried to describe above.

Anyway we started looking into this issue, we will keep you updated.

@mlenkeit @mtwalli Could you give information about the time horizon for any kind of solution? The incidence rates are already rising again, so in the next weeks we will see increased DK package sizes and the problem will get worse again.

mtwalli commented 2 years ago

@vaubaehn Thanks for your feedback, What I proposed it is not obviously a FIX, it just to prevent unnecessary work and be transparent about it (Quick Solution ), but I agree a proper fix would require:

I think you would agree that this would be a MAJOR change that needs carefulness, thoroughly tested , and more time to deliver

at the moment we did not settle on a specific solution, I will update the issue once I have an update

vaubaehn commented 2 years ago

@vaubaehn Thanks for your feedback, What I proposed it is not obviously a FIX, it just to prevent unnecessary work and be transparent about it (Quick Solution ), but I agree a proper fix would require:

* Reduce the keys size

* Change the way how CWA provide them to ENF framework `"(if actually feasible)"`

I think you would agree that this would be a MAJOR change that needs carefulness, thoroughly tested , and more time to deliver

at the moment we did not settle on a specific solution, I will update the issue once I have an update

Hi @mtwalli , thanks for your reply also. Yes, I'm agreeing fully, and I see now we have congruency on Quick Fix and proper fix. I'm curious how you will decide on changing the storing path (changing or not).

It's clear now, that any fix on app side will come with 2.20 earliest, as 2.19 is already closed for testing.

vaubaehn commented 2 years ago

@dsarkar @GisoSchroederSAP

We're now in the situation that RKI reported a record high on new infections. It looks like, that size of Diagnosis Keys packages will exceed even the sizes that we saw in January/February soon. 14 days ago I was reporting a study that predicted the situation that we're now running into: https://github.com/corona-warn-app/cwa-app-android/issues/4753#issuecomment-1049865622.

What I understood from previous discussions/comments with @mtwalli (and @d4rken ), there seems to be a common sense that users should be informed about the possibility of repeated downloads.

15 days ago, I made some proposals for the FAQ: https://github.com/corona-warn-app/cwa-app-android/issues/4753#issuecomment-1048684723

Eventhough there is obviously an exceptionnel risk that people are not falling into the zero rating and are charged for the repeated downloads by CWA (but what has quite an impact in these cases), maybe you agree that it is worthy of the app and due to the situation to provide users with comprehensive information, perhaps with a statement that goes beyond the FAQ.

What are your plans according to this? Or should the community get active asap?

Thanks in advance for your reply!

vaubaehn commented 2 years ago

@Ein-Tim I saw that you're in touch with Nico E., who is associated with a wide-perceived online medium. Did you talk with him about this issue before? Could he be a good addressee for a prepared statement coming from the community? Or should a prepared statement go to the editorial office's inbox directly? You may point him to https://github.com/corona-warn-app/cwa-app-android/issues/4753#issuecomment-1051308965, but some details need to be changed as the devs are already looking into it. But it's unlikely that we'll see any kind of fix until the 6th infection wave is over.

Ein-Tim commented 2 years ago

@vaubaehn

Yes, I'm in contact with @NicoErnst and will talk to him because of this issue.

I'll let you know what he says.

vaubaehn commented 2 years ago

Dear community, this is a text from https://github.com/corona-warn-app/cwa-app-android/issues/4753#issuecomment-1051308965 adapted to the current situation that could be good for a statement. Feel free to share your thoughts.


Corona-Warn-App: Extrem hoher Datenverbrauch bei Android Geräten mit wenig freiem internen Speicherplatz

Einzelne Nutzer hatten den Entwicklern der Corona-Warn-App Ende Januar auf GitHub gemeldet, dass die Corona-Warn-App innerhalb eines Monats einen ungewöhnlich hohen Datenverbrauch durch Downloads aufweist. So wurden in einem Fall Downloads von 2.25GB über eine mobile Datenverbindung, und in einem anderen Fall mehr als 13GB über eine WLAN-Verbindung berichtet. Dabei gab es auch Einzelberichte, dass durch die hohe Downloadrate über mobile Daten das vertragliche Datenvolumen der Nutzer überschritten wurde, so dass in einem Fall für eine Aufstockung des Datenvolumens bezahlt werden musste, und in einem anderen Fall die verfügbare Datenrate gedrosselt wurde. Die Entwickler weisen in ihren FAQ (https://www.coronawarn.app/de/faq/?search=zero#mobile_data_costs) darauf hin, dass alle Datenverbindungen der Corona-Warn-App von den Mobilfunk-Netzbetreibern nicht auf den Datentarif angerechnet werden. So ist derzeit unklar, warum betroffene Nutzer entsprechende Einschränkungen erfuhren. SAP bemüht sich hier um Klärung. In dem betreffenden Issue auf GitHub (https://github.com/corona-warn-app/cwa-app-android/issues/4753) wurde der Grund für den hohen Datenverbrauch von betroffenen Nutzern analysiert: Die Corona-Warn-App lädt sogenannte Diagnoseschlüssel von erkrankten Nutzern, die andere Nutzer warnen möchten, von einem Server für die Erkennung von Risikokontakten herunter. Diese Diagnoseschlüssel werden in einem Zwischenspeicher der App abgelegt. Hat das betroffene Gerät aber weniger als 500MB internen Speicher frei, löscht das Android-Betriebssystem automatisch alle Zwischenspeicher, um freien Speicherplatz zu schaffen. Dies führt dazu, dass die Corona-Warn-App einmal (über mobile Datenverbindung) oder mehrmals täglich (über WLAN) immer wieder alle Diagnoseschlüssel der vergangenen 10 Tage statt eines kleineren Pakets für die letzten Stunden herunter lädt. Die Größe der Datenpakete ist abhängig von der Anzahl der erkrankten Nutzer in Deutschland und den angebundenen europäischen Ländern. So wurden auf dem Höhepunkt der Omikronwelle im Februar von einem betroffenen Gerät bis zu 600MB Daten täglich über WLAN heruntergeladen. Mitglieder der Corona-Warn-App-Community stellen auf GitHub eine Anleitung zur Reproduktion des Problems zur Verfügung und schlagen auch mögliche Verbesserungen der App vor, um das Downloadvolumen massiv zu senken. Nach Angaben beteiligter Entwickler wird an einer Lösung des Problems gearbeitet. Wann aber eine verbesserte Version der Corona-Warn-App zur Verfügung gestellt werden kann, ist augenblicklich nicht absehbar. Zurzeit steigen die Infektionsraten in Deutschland wieder stark an, so dass sich dadurch das Downloadvolumen betroffener Android-Geräte in der kommenden Zeit deutlich erhöhen dürfte, wenn aufgrund zu geringen Speicherplatzes die zwischengespeicherten Diagnoseschlüssel automatisch vom Android-Betriebssystem gelöscht werden. Auch wenn sich die Mobilfunk-Netzbetreiber verpflichtet haben, das Datenvolumen nicht zu berechnen, empfehlen wir vorsichtshalber darauf zu achten, dass vor der Installation der Corona-Warn-App auf Ihrem Android-Gerät mindestens 3GB und während der Nutzung der App ständig deutlich mehr als 500MB interner Speicherplatz zur Verfügung stehen, um keine unliebsamen Überraschungen zu erleben.


Edit: removed the text related to CCTG: the problem may occur here similarly, although less likely, as microG doesn't store Diagnosis Keys packages like ENS. But if a device is low on storage <=500, the repeated downloads will affect the device as well.

NicoErnst commented 2 years ago

Thanks for tagging me, I´ve been following this issue for a while.

Although Tim, Dan, myself et al are putting some effort into more adoption of CWA and offer direct user-support I still have to follow journalistic principles.

In this case it means: I´d be very happy to receive a statement on this problem ASAP or in advance under NDA (this is a completly normal process) but can only suggest filing a story on this to the editors-in-chief. They know what I´m doing here and on Twitter, it´s all transparent.

However, it´s their decision if they run the story then. I´m very hopefull they will do, but there has to be some real news value, i.e.: a solution to the problem.

In parallel, I think it´s a good idea to file a normal press release through the usual channels of RKI, BMG etc.

Let me add a little detail: Many users of (older) Android devices struggle with low internal memory anyways. If CWA makes this worse it´s a very bad signal for more adoption and/or use of the app.

vaubaehn commented 2 years ago

@thomasaugsten Do network carriers also apply zero rating for data tariffs (with limited quota) that are bound to home use, like Congstar Homespot 200 or Giga Cube?

Are landline DSL tariffs with quota included in zero rating, too, like O2-DSL-fairuse?

mtwalli commented 2 years ago

@vaubaehn , I wanted to share the good news for you and the community. In short: we won't change the directory where the key files are currently stored ,but we will provide the diagnosis keys files delta to ENF. In other words after providing the files to ENF successfully we will mark them for deletion and they are excluded from the download again.

vaubaehn commented 2 years ago

@mtwalli That's actually great news!

but we will provide the diagnosis keys files delta to ENF

I'm convinced that this will solve many other timeout related problems, too!

In other words after providing the files to ENF successfully ...

Without any timeout, there should be a reliable return value from ENF, finally

we will mark them for deletion and they are excluded from the download again.

No more repeated downloads 🎉

What milestone are you addressing?

mtwalli commented 2 years ago

Most probably, it will be published as a hotfix for 2.19

vaubaehn commented 2 years ago

Most probably, it will be published as a hotfix for 2.19

@mtwalli If you can manage this, that would be awesome.

@NicoErnst @Ein-Tim Under this circumstances, would it be justified to wait until next week to see how things develop?

Ein-Tim commented 2 years ago

Under this circumstances, would it be justified to wait until next week to see how things develop?

@vaubaehn Definitely!

NicoErnst commented 2 years ago

What Tim says - thank you! As all of this is public we´ll share the good news in #CWAweekly , of course, without setting hopes for next week too high. Happy implementing ;)

mlenkeit commented 2 years ago

Here's one of the related PRs - https://github.com/corona-warn-app/cwa-app-android/pull/4927

vaubaehn commented 2 years ago

The feature PR collecting all single PRs is #4929

Ein-Tim commented 2 years ago

I can see that the PR #4929 is included in https://github.com/corona-warn-app/cwa-app-android/releases/tag/v2.19.1-rc.0.

vaubaehn commented 2 years ago

Hi @mlenkeit , @thomasaugsten , @chiljamgossow , @mtwalli , @dsarkar , @Ein-Tim & @MikeMcC399 ! The hot fix has been released just some minutes ago. 🎉 As you can think, I'm very happy! Dear Chilja, thanks for taking over that challenging refactoring! To all others - FYI: I'm convinced the change of ENF downloading and providing to a delta instead of a "full package" will show many improvements in several places: The internal storage that is used by Google Play Services will decrease drastically within 3-4 days after installation of 2.19.1. Repeated downloads of DK packages should only be a part of past history then. But I also expect that "stopped risk calculation" and similar (timeout dependent) issues will occur rather less often. However, we need to prepare for another change due to the fix that can lead to many users raising questions: I'm expecting that the exposure checking history accessible via Google's UI will reflect the decrease of provided Diagnosis Keys to ENF by displaying a significant lower number of matched diagnosis keys. More advanced users that like to monitor their CWA might stumble upon that change and ask whether their CWA is still working. The answer is: "Yes, better than before". Dear Dipankar, please have an eye on Google Play Store reviews, whether such questions will come. In case, could be good to prepare the support team (and technical hotline) for adequate answers. Hey Tim and Mike, you now also know why Google UI may change the numbers, so you may respond to users with these questions accordingly. Thanks to everyone ❤️