Closed corneliusroemer closed 3 years ago
An interesting metric could probably be how many phones have downloaded diagnosis keys from the CDN each day. This should more or less reflect the number of active apps.
Does anyone know how the community could implement the "diagnosed keys" feature itself? Has anyone studied the server/app protocol and knows this? It may just be querying a known URL periodically - or it may be much harder and require spoofing an app, keys etc. I don't know, but I'm curious to know.
Maintainers, can you help? @SebastianWolf-SAP @tkowark @jakobmoellersap @mynchau
Yes i have,
will probably setup a smalls script that pulls the info hourly from the API and inserts it into Elasticsearch, so we can have a graph later on... API is quite simple:
Days with keys:
https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date
Get keys for a day with keys:
https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date/2020-06-18
as of now, there are no keys :)
@Spacefish You're a star! Just one small question, where did you find these endpoints? Can we be sure that this is where the keys would come from? Still nothing is returned - so I'm a little skeptical. In 3 days since release no one in all of Germany has reported anything yet?
@SebastianWolf-SAP Can you confirm that the endpoint given above by @Spacefish is correct? It'd just be really really nice if the community could verify that the system is up and running and the first people are getting warned. That would just be such good news for everyone, and surely get picked up by the media and give the app another download push 😄
@SebastianWolf-SAP Can you confirm that the endpoint given above by @Spacefish is correct? It'd just be really really nice if the community could verify that the system is up and running and the first people are getting warned. That would just be such good news for everyone, and surely get picked up by the media and give the app another download push 😄
i have decompiled the .apk, it contains the same links as linked by @nilsalex They are not publishing keys yet in the API as they have a min number of keys required to create a new package and that number is not reached yet.. This is due to privacy concerns.
The exact reason is why Diagnosis Keys are not available yet is explained here.
I'm watching the Android app using MITM, and it is still using the same links :)
Also, when Diagnosis Keys do become available for download, and you want to look at them, this could help: https://github.com/mh-/diagnosis-keys
OP: I've found the cumulative number of app downloads (android + iOS) is currently published here: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/WarnApp/Warn_App.html (probably manually updated every day)
Statista is plotting this data over time: https://de.statista.com/statistik/daten/studie/1125951/umfrage/downloads-der-corona-warn-app/
I can confirm the URI https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date
is correct.
Wow that´s really a joke that Telekom is earning 3,5 Million € / month for hosting this.. Hosting that on any cloud or server would probably set you back 10k€/month absolute max..
Please stay on topic @Spacefish. Since this topic is coming up so often and people like to rage about it, I'd like to comment on this: It's obvious that this number is not only about hosting/infrastructure, but also covers other aspects as well - like hotline, service desk, marketing and so on... and you guys might have also noticed, that even though CWA was released on Tuesday, that we did not drop everything & went back to our regular projects. The same people are still working on CWA; creating PR's, fixing bugs, keeping the servers running and adding new features.
as a side note, it like so far the French Corona app has not been too successful (cf. https://techcrunch.com/2020/06/23/french-contact-tracing-app-stopcovid-has-been-activated-1-8-million-times-but-only-sent-14-notifications). Since it is using a centralized approach the authorities running it can say exactly how many notifications have been sent (something that won't be possible with CWA).
My personal statistics so far:
Day Hour Number of reporters / keys (keys distribution can be wrong)
2020-06-23 8 ~20 / 173 net incl. partial padding (backlog, different profiles, hard to analyze)
2020-06-23 13 4 / 1*1, 2*4, 1*5 (1 old Android app(s))
2020-06-23 17 4 / 1*1, 1*3, 1*6, 1*9 (2 old Android app(s))
2020-06-24 6 3 / 2*6, 1*8 (1 old Android app(s))
2020-06-24 8 2 / 2*8
2020-06-24 18 3 / 1*6, 1*7, 1*8
2020-06-24 20 3 / 1*3, 1*5, 1*7 (1 old Android app(s))
2020-06-25 7 7 / 3*1, 1*4, 2*9 (1 old Android app(s)) [3 keys not parsed]
2020-06-25 9 5 / 2*1, 2*3, 1*9 [2 keys not parsed]
2020-06-25 10 2 / 2*9
2020-06-25 11 2 / 2*9
2020-06-25 12 2 / 1*8, 1*9
2020-06-25 15 3 / 1*1, 1*6, 1*9
2020-06-26 7 4 / 1*8, 1*9, 2*10
2020-06-26 10 4 / 2*3, 1*5, 1*7
2020-06-26 14 5 / 1*1, 1*4, 3*10
2020-06-26 15 2 / 2*10
2020-06-26 17 3 / 2*1, 1*4
2020-06-26 18 4 / 2*2, 1*10 (1 old Android app(s))
2020-06-27 9 4 / 1*6, 2*11
2020-06-27 11 4 / 2*1, 1*10 (2 old Android app(s))
2020-06-27 13 3 / 1*10, 2*11
2020-06-27 17 7 / 3*1, 1*2, 1*5, 1*7 (4 old Android app(s))
2020-06-27 21 4 / 1*1, 1*2, 1*11 (1 old Android app(s))
2020-06-28 8 4 / 2*2, 1*12 (1 old Android app(s))
2020-06-28 11 3 / 1*1, 2*12
2020-06-28 13 3 / 1*7, 1*9, 1*12 (1 old Android app(s))
2020-06-29 6 2 / 1*1, 1*13
2020-06-29 8 3 / 1*2, 1*11, 1*12
2020-06-29 11 3 / 1*5, 1*11, 1*12
2020-06-29 13 2 / 1*12, 1*13
2020-06-29 14 2 / 1*5, 1*11
2020-06-29 18 3 / 1*1, 1*7, 1*10
2020-06-30 6 2 / 1*9, 1*12
2020-06-30 7 2 / 1*1, 1*13
2020-06-30 8 3 / 3*13
2020-06-30 9 3 / 1*3, 1*11, 1*13
2020-06-30 10 3 / 1*8, 2*11
2020-06-30 13 2 / 1*1, 1*13
2020-06-30 14 4 / 1*4, 1*9, 1*12, 1*13
2020-06-30 16 3 / 1*4, 1*12, 1*13
2020-06-30 19 4 / 1*1, 3*13
2020-07-01 8 2 / 1*2, 1*13
2020-07-01 10 4 / 4*13
2020-07-01 11 1 / 1*9
2020-07-01 12 2 / 1*4, 1*13
2020-07-01 16 2 / 2*13
2020-07-01 17 4 / 2*1, 1*3, 1*13 (2 old Android app(s))
2020-07-01 19 3 / 1*4, 2*13
2020-07-02 6 3 / 3*13
2020-07-02 7 2 / 2*13
2020-07-02 9 5 / 1*1, 1*4, 3*13 <- padding multiplier now 5
2020-07-02 11 6 / 2*1, 2*4, 1*8, 1*13
2020-07-02 14 7 / 1*1, 1*7, 1*9, 4*13
2020-07-02 15 3 / 1*8, 2*13
2020-07-02 16 3 / 3*13
2020-07-02 19 3 / 1*5, 1*7, 1*13
2020-07-03 8 3 / 1*1, 2*13
Starting from the 2nd line, they are generated by https://github.com/mh-/diagnosis-keys/blob/master/lib/count_users.py
A note on this: The Python script simply "follows" the TRL of keys along the profile 6, 8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1 backwards from the last day and counts the number of keys that match this profile. This stops when a user has not submitted a key for a day, which also causes ambiguous situations. This script is simple and only gives a rough estimate. (A more complicated algorithm, like building up multiple possible trees and then selecting the most probable, would probably give better results.)
The following article published by Der Spiegel claims that the diagnosis files contain "fake keys" "in order to increase privacy" (their wording, not mine): https://www.spiegel.de/netzwelt/apps/corona-warn-app-sendet-erstmals-warnung-wegen-kontakten-mit-infizierten-a-9754dec6-db99-4c06-af27-c2d72ba07d8b This is contrary to what has been claimed here before, i.e. that it had been decided to refrain from inserting dummy keys. Can someone please comment on CWA's actual practice?
@geos-github "Key padding" has been introduced with cwa-server 1.0.8 on Monday: https://github.com/corona-warn-app/cwa-server/pull/609
And some discussion about this are here: https://github.com/corona-warn-app/cwa-server/issues/108 and here: https://github.com/corona-warn-app/cwa-server/issues/620
@Spacefish did you see https://mobile.twitter.com/malteaero/status/1276458483039105024 ? Maybe that is even more useful for a visualization of the published daily keys?
unfortunately the data does not show how many users have submitted their positive tests results, or can that number be inferred from the data?
Do health authorities have figures how many people requested a test based on a risk notification by CWA? And if so, how many of those tests have already been carried out? If of course the most interesting figure would be what percentage of those tests actually turned out a positive result. SAP's CEO (if I am not mistaken) was quoted by the press with a 20% false positive rate for detecting encounters, but even if that figure is correct, it only refers to the measurement system being able to correctly sense an (unshielded) encounter of a certain maximum distance and a certain minimum duration. It does by no means indicate that 80% of the encounters deemed risky by CWA actually resulted in an infection (hopefully that number is much smaller). The latter is the figure (minus those cases which have already been informed though traditional means since e.g. they are part of the household or regular social sphere of the person that reported his or her positive test result) which is of interest for assessing the effectiveness of CWA.
unfortunately the data does not show how many users have submitted their positive tests results, or can that number be inferred from the data?
As far as I know, the number of users can’t be inferred, that was the point of having daily keys. You can guess the typical number of days the app was active. Later on that will be around 14 keys per user. Right now it’s somewhere between 11 and 1 day. Nearly half of the users downloaded on day one so maybe 7-8 keys per user should be there currently? The other numbers you requested could be captured by health authorities, however they are quite busy and have not been very forthcoming to share detailed data above what is absolutely necessary so far.
As far as I know, the number of users can’t be inferred
Of course the number of users can be inferred. Each key has a date attached, and you just need to count the keys per date (maybe divide by 10 because of the 'padding'), on the next day (when users upload their keys). I have written a script that does that: https://github.com/mh-/diagnosis-keys
And there are also automated setups that run the script regularly, e.g. https://micb25.github.io/dka/ https://ctt.pfstr.de
Hmm I see. But consider this (maybe hypothetical) example: User A has broadcasting BLE active only June 20th, uploads keys after positive test on June 22nd. User B downloaded the app on June 21st and directly got a positive test and uploads his keys.
Now we have two keys, one for 20th and one for 22nd. How will anybody infer the correct number of users?
@cfritzsche When you upload one or more keys (up to 13 are possible), the app will assign a Transmission Risk Level (TRL) to each key. The values are 6, 8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1 - yesterday gets a 6, the day before yesterday an 8, and so on. This is meant to reflect your virus Transmission Risk, as explained here - cf. Figure 14. A trivial approach would be to search for keys with TRL value 6. However, there has been a problem with early releases of the Android app, they applied incorrect TRLs (the order of the sequence was reversed), so my script is a bit more complicated. It will also not work correctly under all circumstances, but still it gives a good estimate of the number of users.
It will also not work correctly under all circumstances, but still it gives a good estimate of the number of users.
According to this tweet the ratio between fake keys and real diagnosis keys is currently 9:1. Thus, the estimates are quiet good at the moment.
Great stuff @micb25 ! Doesn’t that cover your requirement above @corneliusroemer ?
The only thing that I could wish on top would be number of devices downloading keys. Unfortunately I don’t know if this can be measured through CDN or if for privacy reasons this is not allowed (some people in the community suggested earlier that IP adresses could be recorded where the downloads are counted).
one can simply divide the number of people uploading their diagnosis keys (taken from https://micb25.github.io/dka/) by the overall number of newly infected people reported for Germany on that day (according to JHU, for example). Of course this assumes that there is no difference in reporting delay. The calculation yields the following percentages (for days of the month in June 2020):
22 0% 23 1.4% 24 2.8% 25 3.8% 26 3.0% 27 4.3% 28 4.3%
The number of devices downloading the keys, if the CDN is willing and allowed to record and publish that data, would of course only provide some indication on actual app usage. It would not provide any indication of the number of people actually being warned by CWA. The latter figure would be very interesting, but I believe only health authorities could provide an indication to it by publishing the number of tests conducted due to a CWA warning notification (and then still some users might choose not to get tested despite of a warning notification or get tested without reference to CWA; do you currently get free testing based solely on a CWA warning notification?).
The calculation yields the following percentages (for days of the month in June 2020):
22 0%
23 1.4%
24 2.8%
25 3.8%
26 3.0%
27 4.3%
28 4.3%
Interesting take, thanks.
do you currently get free testing based solely on a CWA warning notification?
Yes. But as said above, health authorities will probably not share this data.
I have added a daily correlation to my dashboard. It is based on the daily RKI case numbers. I was lucky that I already wrote a small web scraper for the daily RKI case numbers, due to another project of mine for COVID-19 in Thuringia. In comparison to JHU data, however, this leads to slightly different ratios:
22.06. 0.0% 23.06. 2.0% 24.06. 1.9% 25.06. 3.0% 26.06. 4.2% 27.06. 2.6% 28.06. 3.9% 29.06. 5.0%
Thanks. I think JHU or Zeit.de data would be more appropriate (and already looks more stable) for this correlation as the user will probably upload the day he gets the positive test result. That is the same day the local health authority gets the result. On the next day, both CWA backend and local health authority will in most cases report the data together, where JHU or Zeit.de will collect and report them. RKI meanwhile has another delay after this that causes noise in this correlation.
Now there is also public information on number of hotline calls and teleTANs: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/WarnApp/Kennzahlen.pdf?__blob=publicationFile
Still missing: Active apps downloading from CDN.
This was integrated into the app as part of CWA release 1.11. A statistics is shown for the number of shared results in the last 7 days. We will close this issue now. Please create a new issue for other statistics-related feature requests.
Corona-Warn-App Open Source Team
It would be fantastic to have a metric publicly available that shows how successful the app is.
One very relevant metric you must have access to is how many keys of positively tested individuals have been uploaded and shared. This may allow you to approximately infer how many people have reported results through the app. This would very surely interest very many people and be easy to do. It would show that the app is a success and inspire more to download it.
Additional candidate metrics:
Internal Tracking ID: EXPOSUREAPP-2059