corona-warn-app / cwa-wishlist

Central repository to collect community feature requests and improvements. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
105 stars 14 forks source link

Publish number of "positive transmitter" key days on website as metric to show app success #125

Closed corneliusroemer closed 3 years ago

corneliusroemer commented 4 years ago

It would be fantastic to have a metric publicly available that shows how successful the app is.

One very relevant metric you must have access to is how many keys of positively tested individuals have been uploaded and shared. This may allow you to approximately infer how many people have reported results through the app. This would very surely interest very many people and be easy to do. It would show that the app is a success and inspire more to download it.

Additional candidate metrics:


Internal Tracking ID: EXPOSUREAPP-2059

nilsalex commented 4 years ago

An interesting metric could probably be how many phones have downloaded diagnosis keys from the CDN each day. This should more or less reflect the number of active apps.

mh- commented 4 years ago

See also https://github.com/corona-warn-app/cwa-documentation/issues/226

corneliusroemer commented 4 years ago

Does anyone know how the community could implement the "diagnosed keys" feature itself? Has anyone studied the server/app protocol and knows this? It may just be querying a known URL periodically - or it may be much harder and require spoofing an app, keys etc. I don't know, but I'm curious to know.

Maintainers, can you help? @SebastianWolf-SAP @tkowark @jakobmoellersap @mynchau

Spacefish commented 4 years ago

Yes i have,

will probably setup a smalls script that pulls the info hourly from the API and inserts it into Elasticsearch, so we can have a graph later on... API is quite simple:

Days with keys: https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date

Get keys for a day with keys: https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date/2020-06-18

as of now, there are no keys :)

corneliusroemer commented 4 years ago

@Spacefish You're a star! Just one small question, where did you find these endpoints? Can we be sure that this is where the keys would come from? Still nothing is returned - so I'm a little skeptical. In 3 days since release no one in all of Germany has reported anything yet?

@SebastianWolf-SAP Can you confirm that the endpoint given above by @Spacefish is correct? It'd just be really really nice if the community could verify that the system is up and running and the first people are getting warned. That would just be such good news for everyone, and surely get picked up by the media and give the app another download push 😄

nilsalex commented 4 years ago

@corneliusroemer https://github.com/corona-warn-app/cwa-app-android/blob/master/gradle.properties

Spacefish commented 4 years ago

@SebastianWolf-SAP Can you confirm that the endpoint given above by @Spacefish is correct? It'd just be really really nice if the community could verify that the system is up and running and the first people are getting warned. That would just be such good news for everyone, and surely get picked up by the media and give the app another download push 😄

i have decompiled the .apk, it contains the same links as linked by @nilsalex They are not publishing keys yet in the API as they have a min number of keys required to create a new package and that number is not reached yet.. This is due to privacy concerns.

mh- commented 4 years ago

The exact reason is why Diagnosis Keys are not available yet is explained here.

I'm watching the Android app using MITM, and it is still using the same links :)

Also, when Diagnosis Keys do become available for download, and you want to look at them, this could help: https://github.com/mh-/diagnosis-keys

daimpi commented 4 years ago

OP: I've found the cumulative number of app downloads (android + iOS) is currently published here: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/WarnApp/Warn_App.html (probably manually updated every day)

Statista is plotting this data over time: https://de.statista.com/statistik/daten/studie/1125951/umfrage/downloads-der-corona-warn-app/

christian-kirschnick commented 4 years ago

I can confirm the URI https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date is correct.

Spacefish commented 4 years ago

Wow that´s really a joke that Telekom is earning 3,5 Million € / month for hosting this.. Hosting that on any cloud or server would probably set you back 10k€/month absolute max..

christian-kirschnick commented 4 years ago

Please stay on topic @Spacefish. Since this topic is coming up so often and people like to rage about it, I'd like to comment on this: It's obvious that this number is not only about hosting/infrastructure, but also covers other aspects as well - like hotline, service desk, marketing and so on... and you guys might have also noticed, that even though CWA was released on Tuesday, that we did not drop everything & went back to our regular projects. The same people are still working on CWA; creating PR's, fixing bugs, keeping the servers running and adding new features.

geos-github commented 4 years ago

as a side note, it like so far the French Corona app has not been too successful (cf. https://techcrunch.com/2020/06/23/french-contact-tracing-app-stopcovid-has-been-activated-1-8-million-times-but-only-sent-14-notifications). Since it is using a centralized approach the authorities running it can say exactly how many notifications have been sent (something that won't be possible with CWA).

mh- commented 4 years ago

My personal statistics so far:

Day       Hour  Number of reporters / keys (keys distribution can be wrong)
2020-06-23   8  ~20 / 173 net incl. partial padding (backlog, different profiles, hard to analyze)
2020-06-23  13    4 / 1*1, 2*4, 1*5 (1 old Android app(s))
2020-06-23  17    4 / 1*1, 1*3, 1*6, 1*9 (2 old Android app(s))
2020-06-24   6    3 / 2*6, 1*8 (1 old Android app(s))
2020-06-24   8    2 / 2*8
2020-06-24  18    3 / 1*6, 1*7, 1*8
2020-06-24  20    3 / 1*3, 1*5, 1*7 (1 old Android app(s))
2020-06-25   7    7 / 3*1, 1*4, 2*9 (1 old Android app(s)) [3 keys not parsed]
2020-06-25   9    5 / 2*1, 2*3, 1*9 [2 keys not parsed]
2020-06-25  10    2 / 2*9
2020-06-25  11    2 / 2*9
2020-06-25  12    2 / 1*8, 1*9
2020-06-25  15    3 / 1*1, 1*6, 1*9
2020-06-26   7    4 / 1*8, 1*9, 2*10
2020-06-26  10    4 / 2*3, 1*5, 1*7
2020-06-26  14    5 / 1*1, 1*4, 3*10
2020-06-26  15    2 / 2*10
2020-06-26  17    3 / 2*1, 1*4
2020-06-26  18    4 / 2*2, 1*10 (1 old Android app(s))
2020-06-27   9    4 / 1*6, 2*11
2020-06-27  11    4 / 2*1, 1*10 (2 old Android app(s))
2020-06-27  13    3 / 1*10, 2*11
2020-06-27  17    7 / 3*1, 1*2, 1*5, 1*7 (4 old Android app(s))
2020-06-27  21    4 / 1*1, 1*2, 1*11 (1 old Android app(s))
2020-06-28   8    4 / 2*2, 1*12 (1 old Android app(s))
2020-06-28  11    3 / 1*1, 2*12
2020-06-28  13    3 / 1*7, 1*9, 1*12 (1 old Android app(s))
2020-06-29   6    2 / 1*1, 1*13
2020-06-29   8    3 / 1*2, 1*11, 1*12
2020-06-29  11    3 / 1*5, 1*11, 1*12
2020-06-29  13    2 / 1*12, 1*13
2020-06-29  14    2 / 1*5, 1*11
2020-06-29  18    3 / 1*1, 1*7, 1*10
2020-06-30   6    2 / 1*9, 1*12
2020-06-30   7    2 / 1*1, 1*13
2020-06-30   8    3 / 3*13
2020-06-30   9    3 / 1*3, 1*11, 1*13
2020-06-30  10    3 / 1*8, 2*11
2020-06-30  13    2 / 1*1, 1*13
2020-06-30  14    4 / 1*4, 1*9, 1*12, 1*13
2020-06-30  16    3 / 1*4, 1*12, 1*13
2020-06-30  19    4 / 1*1, 3*13
2020-07-01   8    2 / 1*2, 1*13
2020-07-01  10    4 / 4*13
2020-07-01  11    1 / 1*9
2020-07-01  12    2 / 1*4, 1*13
2020-07-01  16    2 / 2*13
2020-07-01  17    4 / 2*1, 1*3, 1*13 (2 old Android app(s))
2020-07-01  19    3 / 1*4, 2*13
2020-07-02   6    3 / 3*13
2020-07-02   7    2 / 2*13
2020-07-02   9    5 / 1*1, 1*4, 3*13 <- padding multiplier now 5
2020-07-02  11    6 / 2*1, 2*4, 1*8, 1*13
2020-07-02  14    7 / 1*1, 1*7, 1*9, 4*13
2020-07-02  15    3 / 1*8, 2*13
2020-07-02  16    3 / 3*13
2020-07-02  19    3 / 1*5, 1*7, 1*13
2020-07-03   8    3 / 1*1, 2*13

Starting from the 2nd line, they are generated by https://github.com/mh-/diagnosis-keys/blob/master/lib/count_users.py

A note on this: The Python script simply "follows" the TRL of keys along the profile 6, 8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1 backwards from the last day and counts the number of keys that match this profile. This stops when a user has not submitted a key for a day, which also causes ambiguous situations. This script is simple and only gives a rough estimate. (A more complicated algorithm, like building up multiple possible trees and then selecting the most probable, would probably give better results.)

geos-github commented 4 years ago

The following article published by Der Spiegel claims that the diagnosis files contain "fake keys" "in order to increase privacy" (their wording, not mine): https://www.spiegel.de/netzwelt/apps/corona-warn-app-sendet-erstmals-warnung-wegen-kontakten-mit-infizierten-a-9754dec6-db99-4c06-af27-c2d72ba07d8b This is contrary to what has been claimed here before, i.e. that it had been decided to refrain from inserting dummy keys. Can someone please comment on CWA's actual practice?

nilsalex commented 4 years ago

@geos-github "Key padding" has been introduced with cwa-server 1.0.8 on Monday: https://github.com/corona-warn-app/cwa-server/pull/609

mh- commented 4 years ago

And some discussion about this are here: https://github.com/corona-warn-app/cwa-server/issues/108 and here: https://github.com/corona-warn-app/cwa-server/issues/620

cfritzsche commented 4 years ago

@Spacefish did you see https://mobile.twitter.com/malteaero/status/1276458483039105024 ? Maybe that is even more useful for a visualization of the published daily keys?

geos-github commented 4 years ago

unfortunately the data does not show how many users have submitted their positive tests results, or can that number be inferred from the data?

geos-github commented 4 years ago

Do health authorities have figures how many people requested a test based on a risk notification by CWA? And if so, how many of those tests have already been carried out? If of course the most interesting figure would be what percentage of those tests actually turned out a positive result. SAP's CEO (if I am not mistaken) was quoted by the press with a 20% false positive rate for detecting encounters, but even if that figure is correct, it only refers to the measurement system being able to correctly sense an (unshielded) encounter of a certain maximum distance and a certain minimum duration. It does by no means indicate that 80% of the encounters deemed risky by CWA actually resulted in an infection (hopefully that number is much smaller). The latter is the figure (minus those cases which have already been informed though traditional means since e.g. they are part of the household or regular social sphere of the person that reported his or her positive test result) which is of interest for assessing the effectiveness of CWA.

cfritzsche commented 4 years ago

unfortunately the data does not show how many users have submitted their positive tests results, or can that number be inferred from the data?

As far as I know, the number of users can’t be inferred, that was the point of having daily keys. You can guess the typical number of days the app was active. Later on that will be around 14 keys per user. Right now it’s somewhere between 11 and 1 day. Nearly half of the users downloaded on day one so maybe 7-8 keys per user should be there currently? The other numbers you requested could be captured by health authorities, however they are quite busy and have not been very forthcoming to share detailed data above what is absolutely necessary so far.

mh- commented 4 years ago

As far as I know, the number of users can’t be inferred

Of course the number of users can be inferred. Each key has a date attached, and you just need to count the keys per date (maybe divide by 10 because of the 'padding'), on the next day (when users upload their keys). I have written a script that does that: https://github.com/mh-/diagnosis-keys

And there are also automated setups that run the script regularly, e.g. https://micb25.github.io/dka/ https://ctt.pfstr.de

cfritzsche commented 4 years ago

Hmm I see. But consider this (maybe hypothetical) example: User A has broadcasting BLE active only June 20th, uploads keys after positive test on June 22nd. User B downloaded the app on June 21st and directly got a positive test and uploads his keys.

Now we have two keys, one for 20th and one for 22nd. How will anybody infer the correct number of users?

mh- commented 4 years ago

@cfritzsche When you upload one or more keys (up to 13 are possible), the app will assign a Transmission Risk Level (TRL) to each key. The values are 6, 8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1 - yesterday gets a 6, the day before yesterday an 8, and so on. This is meant to reflect your virus Transmission Risk, as explained here - cf. Figure 14. A trivial approach would be to search for keys with TRL value 6. However, there has been a problem with early releases of the Android app, they applied incorrect TRLs (the order of the sequence was reversed), so my script is a bit more complicated. It will also not work correctly under all circumstances, but still it gives a good estimate of the number of users.

micb25 commented 4 years ago

It will also not work correctly under all circumstances, but still it gives a good estimate of the number of users.

According to this tweet the ratio between fake keys and real diagnosis keys is currently 9:1. Thus, the estimates are quiet good at the moment.

cfritzsche commented 4 years ago

Great stuff @micb25 ! Doesn’t that cover your requirement above @corneliusroemer ?

The only thing that I could wish on top would be number of devices downloading keys. Unfortunately I don’t know if this can be measured through CDN or if for privacy reasons this is not allowed (some people in the community suggested earlier that IP adresses could be recorded where the downloads are counted).

geos-github commented 4 years ago

one can simply divide the number of people uploading their diagnosis keys (taken from https://micb25.github.io/dka/) by the overall number of newly infected people reported for Germany on that day (according to JHU, for example). Of course this assumes that there is no difference in reporting delay. The calculation yields the following percentages (for days of the month in June 2020):

22 0% 23 1.4% 24 2.8% 25 3.8% 26 3.0% 27 4.3% 28 4.3%

The number of devices downloading the keys, if the CDN is willing and allowed to record and publish that data, would of course only provide some indication on actual app usage. It would not provide any indication of the number of people actually being warned by CWA. The latter figure would be very interesting, but I believe only health authorities could provide an indication to it by publishing the number of tests conducted due to a CWA warning notification (and then still some users might choose not to get tested despite of a warning notification or get tested without reference to CWA; do you currently get free testing based solely on a CWA warning notification?).

cfritzsche commented 4 years ago

The calculation yields the following percentages (for days of the month in June 2020):

22 0%

23 1.4%

24 2.8%

25 3.8%

26 3.0%

27 4.3%

28 4.3%

Interesting take, thanks.

do you currently get free testing based solely on a CWA warning notification?

Yes. But as said above, health authorities will probably not share this data.

micb25 commented 4 years ago

I have added a daily correlation to my dashboard. It is based on the daily RKI case numbers. I was lucky that I already wrote a small web scraper for the daily RKI case numbers, due to another project of mine for COVID-19 in Thuringia. In comparison to JHU data, however, this leads to slightly different ratios:

22.06. 0.0% 23.06. 2.0% 24.06. 1.9% 25.06. 3.0% 26.06. 4.2% 27.06. 2.6% 28.06. 3.9% 29.06. 5.0%

cfritzsche commented 4 years ago

Thanks. I think JHU or Zeit.de data would be more appropriate (and already looks more stable) for this correlation as the user will probably upload the day he gets the positive test result. That is the same day the local health authority gets the result. On the next day, both CWA backend and local health authority will in most cases report the data together, where JHU or Zeit.de will collect and report them. RKI meanwhile has another delay after this that causes noise in this correlation.

cfritzsche commented 4 years ago

Now there is also public information on number of hotline calls and teleTANs: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/WarnApp/Kennzahlen.pdf?__blob=publicationFile

Still missing: Active apps downloading from CDN.

dsarkar commented 3 years ago

FYI - PR https://github.com/corona-warn-app/cwa-app-android/pull/2043

heinezen commented 3 years ago

This was integrated into the app as part of CWA release 1.11. A statistics is shown for the number of shared results in the last 7 days. We will close this issue now. Please create a new issue for other statistics-related feature requests.


Corona-Warn-App Open Source Team