Closed kbobrowski closed 3 years ago
Further on this point: retrieval of Diagnosis Keys seems to be triggered every 24 hours, which further worsens worst-case performance of the system in a situation where:
There seem to be no privacy / technical limitations to enable contact person to learn about exposure status already on the morning of 01-07 (two days earlier).
Change from checking for new Diagnosis Keys every 2 hours to every 24 hours has been introduced recently in this pull request.
Solution architecture document states one hour interval in Data Format, Data Transfer and Data Processing and Bandwidth Estimations.
There seems to be one limitation that would prevent running RPI matching every hour: calls to provideDiagnosisKeys() are limited to 20 per day (for privacy protection reasons, to prevent an app from querying the API about contact with single users, and obviously there’s also the battery consumption thing...) But that’s far away from running it only once per day.
@mh- thanks for additional info, it seems that original design was already taking this into account by capping calls to provideDiagnosisKeys() at 12 per day. So to recap, in original design we had:
And now we seem to have:
This starts to look more like Android implementation issue, since iOS updates diagnosis keys every 2 hours, and queries both days endpoint and hours endpoint for current day (function definition here, called from here). This would mean that delay in exposure notification introduced for iOS users is up to 3 hours, while for Android users up to 48 hours.
We will provide further details in the next days, right now only some preliminary information with a big disclaimer: It will probably become more concrete or even be corrected in the next days when we have the final confirmation by the respective colleagues. It's Sunday and many colleagues take their well-deserved day off.
To my knowledge, both the Android and iOS app behave consistently when it comes to updating the OS-internal Exposure Notification Framework with diagnosis keys - this is done once per day for all platforms. The code which you saw on iOS also triggers the risk calculation, which can be done more often per day (but always with the same diagnosis keys from the current day). You know, the epidemiological parameters for the risk calculation might change, so two different calculations with the same diagnosis keys might actually yield in different results...
The reason for that once-per-day frequency is mainly API rate limiting as already outlined by @mh-. On Android you have 20 per day, on iOS even only 15. This rate limiting doesn't only apply to the calls per day, but also to the number of data files which you can send to the OS-internal framework per day.
As we always need to present the complete, epidemiologically relevant data of all diagnosis key to the framework, we already have 14 signed data files if we retrieve the diagnosis keys once per day for the relevant period of 2 weeks. So there is no chance of updating the diagnosis keys more often than that.
Of course, the backend could provide larger chunks of data (e.g. always the complete data for the respective last 2 weeks), so the number of files for the relevant period of 2 weeks is smaller, which would then also enable to call the framework more often, but that would certainly increase the overall load on the backend infrastructure, as overall more data needed to be served to clients. Probably it also have other drawbacks...
Thus, the current approach is from my perspective a balance between the rate limiting of the API, efficient handling of server infrastructure and actions that make sense from an epidemiological point of view. Even the delay mentioned by @kbobrowski might still be OK when the usual incubation period is taken into account.
But once again: Take this statement with a grain of salt. We will see further updates in the next days and of course keep this issue open for further discussions.
Until then: Enjoy your Sunday!
Mit freundlichen GrĂĽĂźen/Best regards, SW Corona Warn-App Open Source Team
@SebastianWolf-SAP thanks for quick response (on Sunday!)
The thing I'm missing is why background task (whatever it is doing, even if just updating epidemiological paramteres) on iOS is triggered every 2 hours while on Android it is triggered every 24 hours, but perhaps it will explained / corrected later.
Fully understand that rate limiting on iOS and Android is driving factor for refresh rate. On Android it is limited to every 1.2 hours, on iOS to every 1.6 hours. Why choose refresh rate of every 24 hours though? This seems like quite arbitrary number (obviously related to human circadian rhytm, but this seems to have no relevance here). Initial choice of refreshing every 2 hours seemed reasonable.
As we always need to present the complete, epidemiologically relevant data of all diagnosis key to the framework, we already have 14 signed data files if we retrieve the diagnosis keys once per day for the relevant period of 2 weeks. So there is no chance of updating the diagnosis keys more often than that.
I'm missing something here, let me use following user story to illustrate it: I'm meeting with infected person on 01-07. I'm becoming infected on that day. This person receives positive test result on 07-07 and uploads Diagnosis Keys on that day (I'm already contagious then). Full 14 days worth of Diagnosis Keys of a person that infected me will become available at /date/2020-07-07 endpoint. These Diagnosis Keys also become available at /date/2020-07-07/hour/12. Now the problem is that I cannot yet fetch /date/2020-07-07 (it becomes available on next day) and I keep infecting my family / friends. I only learn on 08-07 about the fact that I was exposed.
Why I cannot fetch fresh /date/2020-07-07/hour/12 package with 14 Diagnosis Keys of person that infected me and subsequently isolate myself? This seems like a design decision that may result in loss of health / life, as I can be spreading virus in worst case for about 45 hours more than if 2 hour refresh and fetching hourly keys was implemented.
To summarize, I think this issue is in fact two seprarate but closely related issues:
frequency of fetching data, 24 hour interval increases worst case delay by 22 hours (assuming max frequency is to fetch every 2 hours) - rate at which server bundles new Diagnosis Keys is irrelevant here, if it bundles it every minute or every week it still results in worst-case 22 hours increase in delay in exposure notification arising just from this decision alone
rate at which new bundles of Diagnosis Keys are available, if my understanding is correct each bundle contains all 14 Diagnosis Keys of individual person who contributed to this bundle, so it seems to make sense to make it available as frequently as possible, every 1 hour seemed like a good decision but now it seems to be 24 hours
Each of these issues introduces 24 hours worst-case delay in receiving exposure notification (independently, so it may result in 48 hours delay), compared to original design which had worst-case 3 hours delay.
But let's leave it for working days, enjoy your Sunday as well!
@kbobrowski If I understood @SebastianWolf-SAP correctly, he has the assumption that you cannot just feed the EN API on the device with a set of DKs that have been uploaded in the last 2 hours, but that you must feed it with all DKs that have been uploaded in the last 14 days, and that each such DK file transaction counts towards the rate limit. You also cannot combine the DK files yourself on the device, because they have to be signed in the backend. If that is correct and you do not want the server to deliver full-14-day-packages every 2 hours, all you can do is to collect daily files, and feed the last 14 of them once per day to the framework.
However, this feels like a strange restriction, and it's not really obvious from the API doc, it seems like you can provide a list of key files for each call that is counted for the rate limit:
/**
* Provides a list of diagnosis key files for exposure checking. The files are to
* be synced from the server. Old diagnosis keys (for example older than 14 days)
* will be ignored.
*
* Diagnosis keys will be stored and matching will be performed in the near future,
* after which you’ll receive a broadcast with the
* {@link #ACTION_EXPOSURE_STATE_UPDATED} action.
*
* The diagnosis key files must be signed appropriately. Exposure configuration
* options can be provided to tune the matching algorithm. A unique token for this
* batch can also be provided, which will be used to associate the matches with
* this request as part of {@link #getExposureSummary} and
* {@link #getExposureInformation}. Alternatively, the same token can be passed in
* multiple times to concatenate results.
*
* After the result Task has returned, keyFiles can be deleted.
*
* Results for a given token remain for 14 days.
*/
Task​<​Void​>​ provideDiagnosisKeys​(
List​<​File​>​ keyFiles, ​ExposureConfiguration​ configuration, ​String​ token​);
he has the assumption that you cannot just feed the EN API on the device with a set of DKs that have been uploaded in the last 2 hours, but that you must feed it with all DKs that have been uploaded in the last 14 days
@mh- not sure if @SebastianWolf-SAP meant this but I don't believe that this is how a server bundles data. Bundle from 12-06-2020 contains 78 Diagnosis Keys which were live between 2020-05-30 and 2020-06-11, and bundle from 13-06-2020 contains 84 DKs from time period between 2020-05-31 and 2020-06-12, but there is no single DK shared between these two bundles.
Hourly DKs are also signed so there is no need to combine them and sign again, and I also believe that API works as you described:
it seems like you can provide a list of key files for each call that is counted for the rate limit
Bundle from 12-06-2020 contains 78 Diagnosis Keys which were live between 2020-05-30 and 2020-06-11, and bundle from 13-06-2020 contains 84 DKs from time period between 2020-05-31 and 2020-06-12, but there is no single DK shared between these two bundles.
Yes, I think we all agree on this, and that's why you need to deliver all of them to the API in one "session" (using one token), before you can ask it for a meaningful risk scoring. But since you are apparently able to do it with just one call, I also don't see a reason why this can't be done 15 times per day.
Ok so I think I'm starting to understand what's going on here. Information about paths of downloaded DK bundles is stored in this KeyCacheDao. We have RetrieveDiagnosisKeysTransaction which fetches keys from the server using asyncFetchFiles function. This function simply checks if files for available dates have already been downloaded, if not blocks until missing days are downloaded, and returns list of files corresponding to all existing .zip files with DKs. Then these files are fed by mentioned transaction to executeAPISubmission function which then feeds all of these files one by one to provideDiagnosisKeys. There is an implicit assumption here that provideDiagnosisKeys function will be called less then 20 times a day, which is ensured by fetching only daily keys, only every 24 hours and deleting outdated files. As a result executeAPISubmission is called once a day with a list of maximum 14 files, and then provideDiagnosisKeys will be called maximum 14 times a day (in one short burst, by iterating over provided list).
This seems like a strange design, I thought that it's possible to simply feed Exposure Notification framework with new DKs, they will land in internal LevelDB database and EN framework will simply recalculate risk score. Or just pass in one call a list of all files (daily and hourly bundles). There is this statement in docs of executeAPISubmission:
We currently use Batch Size 1 and thus submit multiple times to the API. This means that instead of directly submitting all files at once, we have to split up our file list as this equals a different batch for Google every time.
Why feed only single-element batches? It seems that we can call this function 20 times a day (each time with multiple files) as stated in Google's documentation:
Calls to this method are limited to 20 per day
On the other hand Apple documentation linked by @SebastianWolf-SAP states that:
Apps can’t submit more than 15 key data files every 24 hours
This is a big difference in functionality between limiting "calls to a function" and "data files passed", is there a typo in one of these documents? Or do Apple and Google frameworks have this much discrepancy in functionality?
In any case, if this is true that there is this discrepancy I don't think that this is the reason for downgrading functionality of one app in order to match lower functionality level of the other app. It is in the best interest of iPhone users that all Android users are notified as fast as possible about their exposure status, such that they have less chance of potentially infecting people around (including iPhone users).
@SebastianWolf-SAP wrote:
Even the delay mentioned by @kbobrowski might still be OK when the usual incubation period is taken into account.
I think @kbobrowski is right to point out unecessarily long delays. One main value this app provides is faster time-to-notification after positive test. But noticing symptoms (or receiving an exposure notification) and getting a test result take 2-3 days even in good times. During that time, contacts are already infectious themselves. If another 2-3 day delay is introduced, the time advantage is essentially lost and the infection has hopped to the next level of contacts. (Where second-degree notification corona-warn-app/cwa-wishlist#24 would come in handy)
I hope this delay issue can be clarified in the coming days.
Wow, there has been quite a few additional comments in the last hours. :) Let me just confirm quickly that we will definitely clarify that issue in the upcoming days with all the required details.
Mit freundlichen GrĂĽĂźen/Best regards, SW Corona Warn-App Open Source Team
@SebastianWolf-SAP thanks for keeping us updated :) just one additional comment: Google's reference Android implementation provides files with Diagnosis Keys in batches and removes these files after confirming that they were submitted to Exposure Notification framework. I also can see by using apps from different countries that Diagnosis Keys are stored inside EN framework in app_en_diagnosis_keys directory (simply as export.bin files). This led me to believe that intended way of interacting with EN is to simply feed it new Diagnosis Keys (in batches, up to 20 times a day) and just let it do the job and notify app in case contact with infected person was determined.
Some final thougths: what still confuses me is this part of @SebastianWolf-SAP response:
As we always need to present the complete, epidemiologically relevant data of all diagnosis key to the framework, we already have 14 signed data files if we retrieve the diagnosis keys once per day for the relevant period of 2 weeks. So there is no chance of updating the diagnosis keys more often than that.
This is reflected in current implementation, where last 14 daily bundles of DKs are stored in local storage and provided every day to provideDiagnosisKeys function. What is really confusing is why we need to repeadedly provide 14 times the same daily bundle to the framework over the course of 14 days. The only reason which comes to my mind is that epidemiological configuration may change and DKs need to be re-evalueated, but this looks like solving right problem in a wrong place, let me explain:
Let's consider alternative approach:
The latter solution seem to provide following advantages over existing one:
I guess one aspect in this is also: Will the transmission_risk_level
that is associated with each Diagnosis Key change with a change of ExposureConfiguration?
If yes, all the Diagnosis Keys would need to be updated when ExposureConfiguration changes; but if not (if there's a fixed robust model for determining the transmission_risk_level
of a Diagnosis Key, then that's not required.
@mh- I don't believe that retrospective updating of transmission_risk_level is in scope because current design effectively prevents it - apps store daily bundles for 14 days and there seem to be no mechanism to trigger re-downloading of an old bundle. The apps also feed all of the bundles they have stored to provideDiagnosisKeys every day. Even if some DKs would be assigned new transmission_risk_levels and published in a new day bundle, the same DKs but with old transmission_risk_level would still be provided to EN along the updated ones.
In "alternative approach" described in my previous comment there would be no problem to implement this feature though should it be needed, DKs with updated transmission risk would just be pushed to new hourly bundle (there would be no previously downloaded DKs with same TEK but different transmission risk level passed alongside to provideDiagnosisKeys)
Both frameworks currently limit the number of files we can submit. As we want to reduce the amount of data downloaded by each client to the minimum, we need to cache the downloaded diagnosis keys - which also need to be signed by the server - so no repackaging on the client side.
The risk calculated for a certain exposure incident might also change over time: Not due to the data changing, but due to the amount of days it lies in the past (e.g. an exposure event from yesterday weights differently, than one of 10 days ago). To take this into consideration for the calculation of the risk (and to properly include the new attenuation buckets), we need to feed all keys into the API every day... And this is, how we already hit the API rate limit by doing the matching once per day.
If you have direct line to the Apple and Google Exposure Notification teams, it might be worth highlighting this to them. I can't imagine that calling the API once a day at most was what they had in mind with their rate limits.
@tklingbeil thank you for the response, I have some further questions:
at the very minimum (without making any changes to current design) it would be possible to decrease delay in exposure notification by implementing fetching of new daily bundle of DKs in early morning hours every day, e.g. at random time between 01:00 and 04:00, and if phone did not have access to the internet during these hours then at earliest possible time. This would already significantly reduce current worst-case delay by almost half (approx 20-23 hours). Would you consider including such change?
it seems that indeed Google/Apple did not have in mind specific way of using their APIs which is implemented by CWA, have you reached out to Google/Apple in order to attempt to patch it on their side?
I can see the importance of including old bundles in exposure calculation, to combine multiple exposures and infer more accurate risk. I also understand that risk associated by each exposure changes as time passes. But it seems that similar result can be obtained by storing individual ExposureSummary instances returned from EN (for the past 2 weeks) and based on this infering exposure status to display. With this approach there is no need to store past bundles of Diagnosis Keys, and score calculated this way can be easily updated on 2-hourly basis (12 calls to EN with single file each a day). Would you consider it viable approach? This would reduce current exposure notification delay by a factor of 12.
Thank you in advance for your patience, I understand that you are very busy this week but it seems that current worst-case 48 hours delay is bounded to have serious consequences on someone's life / health, and this seems like something that can be at least partially avoided.
Indeed we did reach out to them already, and they are looking into the issue & see what they can do. They also noted that when they make changes, that it will take some time until this is shipped to all the phones - so we can't expect any adjustments within the next 2 weeks or so.
I've asked Google about it at their exposure-notifications-feedback mailing list:
Hi,
I'd just like to quickly clarify limits imposed by Exposure Notifications API on calls to function provideDiagnosisKeys(). Documentation states that "Calls to this method are limited to 20 per day", could you confirm whether limits are imposed on execution of function call (such that it would be possible to call it max 20 times a day, but each time passing multiple files with Diagnosis Keys), or maybe limits are imposed on arguments (total sum of passed files cannot exceed 20)?
Best regards, Kamil
And got response (also got permission to publish it here):
Hi Kamil,
The limit is imposed on the number of method calls, so providing multiple files per call still results in a single quota being taken.
Thanks, Jake
Diese Studie: https://www.csh.ac.at/wp-content/uploads/2020/06/CSH-Studie-%C3%84rztefunktdienst.pdf kommt zu dem Schluss, dass falls die Isolation 6-12 Stunden früher angefangen wird, dies die Ansteckungsrate um bis zu 58% reduzieren kann. Unter diesem Hintergrund wäre ich dafür, die Keys mehrmals täglich zu aktualisieren.. Bis zu 48 Stunden zusätzliches Delay sind hier einfach inakzeptabel, denn dann vergehen im Worst-Case 4-5 Tage zwischen Test -> Notification..
"Unsere Studie zeigt, dass Maßnahmen, welche die effektive Infektiositätsdauer auch nur um wenige Stunden für einen großen Teil der Erkrankten reduzieren, eine deutliche Wirkung entfalten"
Vorschlag:
/version/v1/diagnosis-keys/country/DE/datesince/2020-06-14_1
um alle Key Dateien von 2020-06-14 Stunde 1 bis heute zu erhalten.
Um zu ermitteln welche Dateien gebraucht werden, iterriert die App einfach die Dateien von -14 Tage bis max. Jetzt durch und bricht ab und stellt das request, sobald sie auf die erste fehlende Datei stößt.
An die Google API wird ein Batch Request mit der Batchsize 336 gesendet (entsprechend kleiner, falls es keine Keys fĂĽr eine Stunde gibt), tagsĂĽber alle Stunde Nachts alle 4 Stunden (um das 20 Request / Tag Limit nicht zu sprengen)@ChristopherSchmitz I guess you can answer this best.
I'm surprised this issue has not gotten more attention, it seems extremely important. In general, I think it would be good to talk more to Google/Apple about this and raise questions in public on their repos. Furthermore, we should reach out to other countries that are using GAP like Italy, Switzerland, Latvia (maybe ask question as issue on their Github repo) and ask whether they know anything and how they designed it.
I can't imagine that there isn't a better way of doing this than the current implementation here.
All comments are good, only one I find questionable:
Of course, the backend could provide larger chunks of data (e.g. always the complete data for the respective last 2 weeks), so the number of files for the relevant period of 2 weeks is smaller, which would then also enable to call the framework more often, but that would certainly increase the overall load on the backend infrastructure, as overall more data needed to be served to clients. Probably it also have other drawbacks...
Thus, the current approach is from my perspective a balance between the rate limiting of the API, efficient handling of server infrastructure and actions that make sense from an epidemiological point of view. Even the delay mentioned by @kbobrowski might still be OK when the usual incubation period is taken into account.
Is backend load really an issue for this project? Such claims should be quantified - does a change cost 1k, 10k or 100k € extra per day? And weighed against potential benefit: earlier notification, earlier test, reduced transmission, somehow quantified (don't feel I know enough for now). Rather than just being done by gut feeling.
How can a delay of 24hr, even 12hr be OK? I think getting the RKI in on this would be helpful too, @BenzlerJ, maybe? Some assumptions need to be checked here.
I don't want to blame past design decisions, you had to get it out quickly and managed to do that. But that doesn't stop us from rectifying potentially subideal decisions that save lives.
Given the life/death importance of this issue, it would be great to have another update from maintainers on their current views after 3 days: @SebastianWolf-SAP @tkowark @tklingbeil @cfritzsche @christian-kirschnick I come in peace.
The currently released app version 1.0.2 makes exactly 3 attempts (BackgroundConstants.WORKER_RETRY_COUNT_THRESHOLD) in less than 1 minute, once per day (DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY), to try and download Diagnosis Keys. If the network connection is broken during that time window, it waits for another day. I think this should be improved as well.
(@corneliusroemer thanks for the information about 1.0.2)
@mh- Small correction: At least for Android 1.0.2 was pushed through play store yesterday/today Plus: maybe this issue should get relabelled - I think it's about more than a "question" and "documentation", "enhancement" and "bug" would not suit it badly I think.
Update: @mh- raises a point about keys only being attempted to be downloaded once a day, 3 times. The fact that no further attempts are made when the initial attempt failed may be a cause behind all sorts of issues that report lack of exposure key updates: e.g.
@mh- @kbobrowski Do you think it would make sense to open a new issue to suggest that more tries to key retrieval be made throughout the day? It's related to the issue here but can be fixed independently.
@corneliusroemer maintainers are already pointed from the issues you cited to this issue, perhaps one issue focusing on this specific problem will be created, I won't do it for now since I have not gathered enough information myself. It's important to keep signal to noise ratio high so until I have enough info I tend not to open new issues. If you or @mh- feel you are in a position to raise new specific issue then it's up to you of course :)
@mh- that's an interesting finding, also quite interesting is this part of the code:
/**
* Get maximum calls count to Google API
*
* @return Long
*
* @see BackgroundConstants.DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY
* @see BackgroundConstants.GOOGLE_API_MAX_CALLS_PER_DAY
*/
fun getDiagnosisKeyRetrievalMaximumCalls() =
BackgroundConstants.DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY
.coerceAtMost(BackgroundConstants.GOOGLE_API_MAX_CALLS_PER_DAY)
it indicates that initial intention was indeed to call provideDiagnosisKeys() multiple times a day, each time with a list of Diagnosis Keys. Right now if DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY was set to anything higher than 1, let's say 2:
so it may be not easily fixable without some deeper changes. Crucial thing regarding this issue is once again: to understand limits on provideDiagnosisKeys, and why we need to feed single-element batches.
Value of DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY was changed from 12 (as we could have expected, fetching every 2 hours) to 1 in this pull request.
Ideally, the "cwa serial interval", i.e. the delay between receiving an alert, then getting a test done, receiving the test result, uploading your keys and then finally the next "generation" of alerts being received should be less than the COVID-19 serial interval, which appears to be 4-5 days. I have some doubts about the feasibility of this, since laboratories and public authorities might not be fast enough for this to work in all cases. If the purely technical delay between uploading keys and alerts popping up is something like 24 hours on average, 48 hours in a bad case, and even more in worst case (backend or network issues during the 3 minute daily "attempt to update" interval) then it's even more unlikely.
The distribution API URL linked in the issue report seems to be serving 0 keys as of now, more than 72 hours after app launch. Hopefully this is just caused by some initial issues in laboratories or authorities. Otherwise, positive tests results which were communicated to the tested person on Tuesday (test probably done on Sunday/Monday?) would have their alerts received no earlier than Saturday, at least one week after meeting the infected person. By then, the person who receives an alert has probably passed the "most contagious" phase already, and if it takes just a few days longer, they might not even be contagious any more, reducing the app efficiency to 0%.
Edit: Thank you Christian for pointing that out to me (in the next comment). It's a good approach, and I've rushed to an incorrect conclusion there. Looks like that part of my concern is invalid and we should see some keys published soon, as uploads include more and more keys. Sorry for the noise.
I still feel like it's quite important to reduce the not-extremely-uncommon-bad-case of 48 hours delay as much as possible, even if that means the backend needs to be 14x as beefy. Also the "3 attempts in a 3 minute time window once a day" issue is very important, but it looks like this is already being worked on.
The distribution API URL linked in the issue report seems to be serving 0 keys as of now, more than 72 hours after app launch. Hopefully this is just caused by some initial issues in laboratories or authorities.
The reason why this is the case is not that there aren't any diagnosis keys - but that we have not yet reached the minimum amount of keys which are required for a package to be published.
The value is defined here and the issue/discussion for that topic can be found here.
# The minimum number of diagnosis keys per bundle.
shifting-policy-threshold: 140
So 140 keys are required for a bundle to be created as of today. Since CWA is only running for like 4 days, people are only able to upload one to three keys (depending on when they activated CWA on their phone). When CWA has run for two weeks, only ~10 people are required for a bundle.
Personally, I would also favor to have that threshold value to be 0. But I also realize the DPP concerns, and DPP is one of the major drivers on how CWA is designed. This initial threshold value was even far higher when we started off, and was reduced in order to balance exactly those concerns while still preserving anonymity.
Vorschlag:
- Eine signierte Datei pro Stunde. Das sind dann 336 Dateien fĂĽr 14 Tage..
- Range Get in die API einbauen, App fragt die API also z.B.
/version/v1/diagnosis-keys/country/DE/datesince/2020-06-14_1
um alle Key Dateien von 2020-06-14 Stunde 1 bis heute zu erhalten. Um zu ermitteln welche Dateien gebraucht werden, iterriert die App einfach die Dateien von -14 Tage bis max. Jetzt durch und bricht ab und stellt das request, sobald sie auf die erste fehlende Datei stößt. An die Google API wird ein Batch Request mit der Batchsize 336 gesendet (entsprechend kleiner, falls es keine Keys für eine Stunde gibt), tagsüber alle Stunde Nachts alle 4 Stunden (um das 20 Request / Tag Limit nicht zu sprengen)
Bei einem Limit von 20 Requests pro Tag müssten man bei jedem Call jeweils eine Datei haben, welche die Keys der letzten 14 Tage beinhaltet, plus die Stunde, welche gerade dazugekommen ist. Das bedeutet also, dass die App jede Stunde alle Keys des Servers laden muss. Ein Caching auf dem Mobile ist somit nicht möglich.
Bei einem Limit von 20 Requests pro Tag müssten man bei jedem Call jeweils eine Datei haben, welche die Keys der letzten 14 Tage beinhaltet, plus die Stunde, welche gerade dazugekommen ist. Das bedeutet also, dass die App jede Stunde alle Keys des Servers laden muss. Ein Caching auf dem Mobile ist somit nicht möglich.
No, not really, as you can provide multiple key files per call to provideDiagnosisKeys(...);
The calls to provideDiagnosisKeys(...) are limited to 20 per day, not the number of files submitted! The app can download any number of files it wan´s via HTTP and cache them locally before submitting them to the Exposure API. So we only need to download the hour files not already submitted to the Exposure API.
This could be done by just storing the single timestamp up to which hour we already downloaded the files, if this value is empty just initialize it with today - 14 days. It´s sufficient to only submit new hour files to the Exposure API, but if we missed downloading some (due to no network) it´s again totally legal to submit multiple files in one API call.
You don´t have to provide one file at a time to provideDiagnosisKey(...); You can submit multiple key files at once!! One call to the function will only count to your 20 days daily call limit
Task<Void> provideDiagnosisKeys(List<File> keyFiles, ExposureConfiguration configuration, String token);
edit: sorry for german in original comment, i somehow missed that completly...
Personally, I would also favor to have that threshold value to be 0. But I also realize the DPP concerns, and DPP is one of the major drivers on how CWA is designed. This initial threshold value was even far higher when we started off, and was reduced in order to balance exactly those concerns while still preserving anonymity.
Did you consider generating some dummy random keys to fill this up?
That was my suggestion, but there were concerns that the dummy data could be distinguished statistically from real data, so this was not implemented.
I think that this is not a good situation as we may now have a few keys waiting to be distributed, but they are not being distributed; plus once there are enough of them, we still wait up to 24 hours due to the issue mentioned in this issue (plus the concern @w-flo mentioned above that if there is no internet connection at that point in time, the fetching is deferred by a day).
In my view everything needs to be done to get the keys out within a few hours the latest after submission. Speed is the big advantage of the app, so let's not lose that.
Bei einem Limit von 20 Requests pro Tag müssten man bei jedem Call jeweils eine Datei haben, welche die Keys der letzten 14 Tage beinhaltet, plus die Stunde, welche gerade dazugekommen ist. Das bedeutet also, dass die App jede Stunde alle Keys des Servers laden muss. Ein Caching auf dem Mobile ist somit nicht möglich.
@christian-kirschnick what do you mean by "die App jede Stunde alle Keys des Servers laden muss"? What keys exactly you have in mind? Right now app does not fetch all 14 daily bundles every day from the server, it just fetches missing ones such that it maintains history of 14 days worth of daily bundles in local storage. Same should be possible for hourly bundles, right? You just check which hourly bundles are missing, download missing hourly bundles to local storage, and feed all the keys to provideDiagnosisKeys. Or am I missing something?
Did you consider generating some dummy random keys to fill this up?
Yes we did, check out the issue I linked previously: Aggregate File Creation: Skip packages if payload too small corona-warn-app/cwa-documentation#108 .
You can submit multiple key files at once!! One call to the function will only count to your 20 days daily call limit
Depends on the platform: Android and iOS are behaving differently here - one counts files, one counts API calls. However, I guess somebody from the mobile teams can explain this better.
@kbobrowski We simply can't use the hourly packages, because we are rate limited by the GAEN. It's quite evident that we went for hourly files since the beginning (check the server implementation - even now we still are generating the hourly packages), because we wanted to warn users as early as possible. We only learned about the rate limiting from GA very late in the development process, so changing the aggregation interval at that point in time was just too risky. GA implementing the rate limiting differently certainly did not help here.
We are still fighting to get the rate limiting sorted out with GA, especially since federation/roaming is next on the roadmap - which will just worsen this problem for us.
Depends on the platform: Android and iOS are behaving differently here - one counts files, one counts API calls. However, I guess somebody from the mobile teams can explain this better.
Thanks for this information, so we get some further clarification here since so far the statement was that both frameworks count files. I think we can conclude that Google is counting API calls and Apple is counting files, as it is also consistent with separate documentations from Google and Apple, and with my correspondence with Google.
@christian-kirschnick since Google does not seem to limit files and only API calls, why using hourly packages is not possible on Android? From the git history it's quite clear that fetching hourly packages every 2 hours was already implemented in Android app, but it has been removed shortly before launch. Is there a requirement that both apps update user about exposure status with exactly the same frequency? I understand that you worked on the server side, but I guess you also know proper people from mobile teams who could clarify this for us.
The currently released app version 1.0.2 makes exactly 3 attempts (BackgroundConstants.WORKER_RETRY_COUNT_THRESHOLD) in less than 1 minute, once per day (DIAGNOSIS_KEY_RETRIEVAL_TRIES_PER_DAY), to try and download Diagnosis Keys. If the network connection is broken during that time window, it waits for another day. I think this should be improved as well.
(@corneliusroemer thanks for the information about 1.0.2)
If I'm not totally mistaken, I think a commit that is part of v1.0.3 tries to fix the issue @mh- mentioned above at least partially, by fetching new keys every time the app is launched if they haven't been downloaded during that day already.
Would be good if @jakobmoellersap as the author of the commit could comment, @kbobrowski you may also want to have a look at whether the code does what I think it does.
@corneliusroemer yes it seems that every time onResume
of RiskDetailsFragment
or MainFragment
is called then Diagnosis Keys will be fetched (if they have not been yet already on current day), and fed to provideDiagnosisKeys to do the matching. That's very good development since it allows user to manually trigger it, reducing worst-case delay by half (from 48 hours to 24 hours) if user manually triggers Diagnosis Keys retrieval by displaying one of these Fragments (and if the user knows that it has to be done just after midnight, once Diagnosis Keys for previous day became available on the server, related comment).
To reduce remaining 24 hours we still need to re-introduce fetching of hourly Diagnosis Keys.
If I'm not totally mistaken, I think a commit that is part of v1.0.3 tries to fix the issue @mh- mentioned above at least partially, by fetching new keys every time the app is launched if they haven't been downloaded during that day already.
corona-warn-app/cwa-app-android@03679cf
Would be good if @jakobmoellersap as the author of the commit could comment
It's about time that v1.0.3 is published, then, because the original "once per day" worker seems to be extremely unreliable. Even with a good internet connection, if the app is in the wrong state at the wrong time, it will simply skip reloading. (While there's anyway no content to download at the moment, I think this will change soon.)
@christian-kirschnick @pithumke @michael-burwig
Did you consider generating some dummy random keys to fill this up?
Yes we did, check out the issue I linked previously: Aggregate File Creation: Skip packages if payload too small corona-warn-app/cwa-documentation#108 .
Thank you for pointing this out. I didn't find another place in the documentation where the exact requirements and the reasoning are explained, but I think this
However, the purpose of this whole "threshold" feature is to preserve peoples anonymity and to not allow anyone to create movement profiles. In order to achieve that, it must be impossible to match a specific temporary exposure key (TEK) to any particular person. In the "shifting" approach, this is achieved by ensuring that a single person's TEKs are always mashed together with enough other people's real TEKs to get lost in the shuffle.
gives the following DPP-related reasons:
Ad 1. User Anonymity is automatically preserved by uploading only randomly created data hat does not contain any information about the user and can by itself not be linked to a user. (Of course, if someone is warned by the app, and remembers that they only met a single person on that day for that much time, they can link the warning to that person - but IMHO this is an acceptable trade-off and usually anyway the intention of the person who wants to warn.)
Ad 2. Movement profiles. It is well known that if an infected user A distributes a TEK in order to warn others, this will allow users B who have received RPIs that were created from the TEK to link all RPIs to the same TEK. Without this distribution of the TEK, users B will usually not be able to link RPIs for more than 10-20 minutes, because the RPIs change after that period. Note that this still keeps the same anonymity as 1., because the TEK is a random value that does not contain any information about the user. The Google / Apple Exposure Notifications system does however roll the TEKs once per day (at 00:00 UTC), to prevent linking RPIs to the same TEK across multiple days. (Quoting from the Cryptography Specification: "When reporting Diagnosis Keys, the correlation of Rolling Proximity Identifiers by others is limited to 24 hour periods due to the use of Temporary Exposure Keys that change daily.") It is exactly this correlation of received RPIs by users B longer than 24h that the "shifting policy" intends to prevent.
My Opinion on this:
I) The risk that attackers (users B) can get a movement profile of a specific person is very low. They would need to receive RPIs in many places, AND would need to link them to a specific person by other channels. I think this is acceptable within the trade-off that user A makes when deciding to upload TEKs.
II) Preventing a correlation is only possible if many users transmit RPIs in the same small geographic region and upload their keys. A shifting-policy-threshold
of 140 keys in Germany cannot achieve this.
Based on I) I would propose to stop adding more "snake oil" and start distributing the uploaded keys now, as I believe that was the intention of the users who uploaded them.
Pit worked on a draft yesterday night, which you can check out here Add random key padding #609. Feedback / change requests are very welcome. We have this thoroughly checked with all stakeholders to ensure solidness from a technical & privacy preserving standpoint, before we can push this to production.
@christian-kirschnick @pithumke thanks, I think the server still packages data on hourly basis, so once padding approach is merged we'll have max efficiency on the server side. If I understood padding correctly it would guarantee that when a person uploads Diagnosis Keys they will be packaged in the next hourly package? (and not wait until certain number of DKs has accumulated)
@christian-kirschnick That's great news. I strongly agree that getting keys out as fast as possible is crucial for preventing infections and as @mh- pointed out the benefits of publishing new keys every hour outweigh the small privacy improvements provided by using a shifting-policy-threshold
of 140 instead of a padding approach.
Regarding the delay for fetching new keys from the server, I wonder if it would be worth considering fetching data more frequently on Android devices as a first measure until things get eventually sorted out by Apple . While maintaining consistency across platforms certainly has it's upsides using a different behavior on Android would benefit the majority of users.
In order to allow for more frequent updates on IOS it might also be worth considering some sort of delta compression. This would involve bundling all hourly packages into one (or few) large package(s) and extending upon the official specification for Exposure Notification Servers by allowing clients to download diffs of the most recent export.bin
and export.sig
files. When fetching new data the client would then specify the latest package version available on the client and only download the delta compressed diff data. The most recent package can then be reconstructed locally using the downloaded data and the old package available on the client. Given the structure of the TemporaryExposureKeyExport
protocol buffer, using delta compression should be very efficient.
Alternatively, rather than allowing clients to download diffed data the server could also bundle hourly packages and allow clients to download the package (including signature) without the actual key data. The client can then generate the package from the hourly packages by populating the keys
field locally.
Pit worked on a draft yesterday night, which you can check out here Add random key padding #609. Feedback / change requests are very welcome.
@christian-kirschnick @pithumke I think your code will work as you intended, it will store 10 keys for each uploaded key, 1 with the original key bytes, 9 with random key bytes, all the other properties are the same. So nobody will be able to figure out which of the 10 keys came from the original submission. If this is your intention, then fine, go ahead and push it to production. I'm certain that it will not hurt in terms of Data Privacy; if you cross the 140 keys threshold by doing this, no problem, a threshold of 140 vs. 14 for a whole country doesn't make a difference as I explained above.
If there will be more users uploading keys in the future, you might want to reduce the randomKeyPaddingMultiplier
factor again, so save bandwidth.
@noname202 these are good ideas, bring some complexity in but could solve the problem for iOS. Also agree that Android implementation should have max performance in terms of notification time which is allowed by Google framework, Android devices have like 70-80% market share in Germany so it seems to be worth giving up 100% consistency in order to significantly improve notification time for majority of population.
I still wonder why we even need to feed old packages - the only thing I can think of right now is that if someone had several "semi-relevant" encounters in epidemiological sense then it would be understandable that we need to look at the previous packages to estimate overall risk score. But isn't this problem more like looking for a needle in a haystack - I guess in vast majority of cases user of the app would only have one encounter with infected person?
The way of interacting with EN that is implemented right now seems to go directly against the way Google (and I guess Apple) intended the framework to be used - from documentation of provideDiagnosisKeys
:
After the result Task has returned, keyFiles can be deleted.
So Google intended these keys to be ephemeral, just evaluated by framework and discarded, not to store history of keys in the attempt to provide more accurate risk score which includes exposures from past packages (how many users would have multiple exposures which only when combined give "high risk" status?). This potentially more accurate risk score feels like over-optimization which could provide some benefits but right now it seems to be degrading heavily overall performance of the system (alternative solution is just to lower threshold for "contact event" - which would have side-effect of more notifications, but this is just making the system "too safe", not less effective). But perhaps it's not the best place to discuss it - I guess these were instructions from epidemiologists.
Happy to see padding on the server side, and keeping my fingers crossed that it would be merged without problems :)
@kbobrowski I guess google framework internally flags your tokens that match a key submitted by the App to the API, so they can do re-calculation of the scores, if the parameters are changed.
Current situation is really bad IMHO, as if it continues like that, received tokens might be > 14 days old and removed from the internal database, before the keys are published :(
I have tried calling provideDiagnosisKeys
(disclaimer: it's just hooking into the running process, and using DKs which were live before the app launch, there is no vulnerability exploited here), with one file there is no problem:
provideDiagnosisKeys(
keys=[/data/local/tmp/cwa-dk-01.zip]
conf=ExposureConfiguration<minimumRiskScore: 4, attenuationScores: [4, 4, 4, 4, 4, 4, 4, 4], attenuationWeight: 50, daysSinceLastExposureScores: [4, 4, 4, 4, 4, 4, 4, 4], daysSinceLastExposureWeight: 50, durationScores: [4, 4, 4, 4, 4, 4, 4, 4], durationWeight: 50, transmissionRiskScores: [4, 4, 4, 4, 4, 4, 4, 4], transmissionRiskWeight: 50, durationAtAttenuationThresholds: [50, 74]>
token=rrandomtoken1337
)
on success listener: null
but when calling with multiple files there is signature error:
provideDiagnosisKeys(
keys=[/data/local/tmp/cwa-dk-01.zip, /data/local/tmp/cwa-dk-02.zip, /data/local/tmp/cwa-dk-03.zip, /data/local/tmp/cwa-dk-04.zip]
conf=ExposureConfiguration<minimumRiskScore: 4, attenuationScores: [4, 4, 4, 4, 4, 4, 4, 4], attenuationWeight: 50, daysSinceLastExposureScores: [4, 4, 4, 4, 4, 4, 4, 4], daysSinceLastExposureWeight: 50, durationScores: [4, 4, 4, 4, 4, 4, 4, 4], durationWeight: 50, transmissionRiskScores: [4, 4, 4, 4, 4, 4, 4, 4], transmissionRiskWeight: 50, durationAtAttenuationThresholds: [50, 74]>
token=rrandomtoken1337
)
on failure listener: com.google.android.gms.common.api.ApiException: 10: Unable to validate key file signature: signature batch size (1 does not match actual batch size (4)
all 4 files are unpacked and stored by EN framework:
root@titan_umtsds:/data/data/com.google.android.gms/app_en_diagnosis_keys #
ls 1592778570978
export0.bin
export1.bin
export2.bin
export3.bin
It is also reflected in the documentation - each file needs to have assigned proper position in a batch and batch size:
// For example, file 2 in batch size of 10. Ordinal, 1-based numbering. // Note: Not yet supported on iOS. optional int32 batch_num = 4; optional int32 batch_size = 5;
It seems that it further complicates approach with caching DKs and feeding all of them in one batch to provideDiagnosisKeys
- each time they would need to be modified and have new signatures assigned to reflect position in current batch. This further strengthens the point that Google did not have this use-case in mind, batching seems to be there just if there is a need to download larger chunk of data which can be divided into several files, feed all these files to the framework and forget about them.
Android app version 1.0.4 is now available from the Google Play Store, and it did try to download keys on the first start. Alas, the server still doesn't distribute any Diagnosis Keys yet.
% curl https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date
[]
Wonder why there are no Diagnosis Keys published... Let's say 500 people a day catch covid, so there is daily probability of p=0.00000624 that each individual person will catch it. But with 10.000.000 users the probability that 1 of those users will get it is touching 100%. With only 100.000 users it would be already 60%.
I've reached out to Google once again regarding the issue with passing multiple files (with batch_size=1) to provideDiagnosisKeys and they just corrected official docs from:
The quota applies per call, not per file uploaded. That is, a multi-file upload counts as only one call.
to:
The quota applies per call, not per file uploaded. That is, a multi-file batch upload counts as only one call. Only one batch of files can be passed in at a time.
Also they let me know that there will be an update shortly which would allow to pass multiple batches in a single call, so this would allow easily 2-hourly update without changing current architecture, hopefully this update will arrive soon.
Current API allows for querying days on which Diagnosis Keys became available:
Apps use this endpoint to check if these dates are already in cache, and if not to download Diagnosis Keys for missing days using endpoint:
Let's assume a scenario where someone had contact with infected person. Infected person uploads Diagnosis Keys on 2020-06-12, but these Diagnosis Keys are bundled into package that is only available to download on 2020-06-13. This introduces a day of delay, during which a person who had contact with diagnosed person is unaware of this contact and is not able to self-isolate, potentially endangering people around.
Interestingly it seems to be possible to get Diagnosis Keys uploaded on current day, but only using hour API endpoint:
This endpoint seems to be only used in a function which is marked to be dropped before release. Functionality to fetch hours for current day existed, but it was removed in this pull request.
Is there a reason not to continuously download Diagnosis Keys which are being made available during current day, and instead wait until package for complete day is ready?
As far as I understand this would not introduce any privacy issues since they are already handled by upload mechanism documented here, and would just let people know faster that they were exposed. Privacy of this solution can also be confirmed by looking at latest Diagnosis Key in each bundle, e.g. in a bundle from current day (2020-06-13, hour 12) latest Diagnosis Key has interval number 2653200, meaning it was generating RPIs yesterday:
Internal Tracking ID: EXPOSUREAPP-1567