Science-Blog-5: wrong number of warned individuals

Mtagxx commented 2 years ago

Where to find the issue

Describe the issue

In Science-Blog-5 is stated: "... it can be estimated that for each individual providing a warning, 19 individuals receive a red warning." Actually the number of people who will be warned must be smaller.

As explained in "2.2 Purpose 2 – Warning others" data is taken from the CWA data donation. Here however, often one device seems to report more than once a "red risk". For example: there are 29.663.951 "red risk" warnings for June/July 2022 reported by CWA Dashboard. (Just summing up ppa_risk_red_daily from "Warnings received by data donors".) This exceeds the number of CWA users and still doesn't take into account, that only about half of the users participate in data donation.

Suggested change

The number of warned individuals in Science-Blog-5 should be reevaluated and reconsidered.

Maybe the CWA data donation can be fixed, so that each "red warning" is only reported once.

Internal Tracking ID: EXPOSUREAPP-14099

mlenkeit commented 2 years ago

Hi @Mtagxx, thanks for reporting this potential inconsistency! Please give us some time to double-check the 19 individuals figure.

Regarding the 29 million "red risk" warnings in June/July 2022: please note that this figure does not represent unique users/devices. If I'm not mistaken, the reported number for a given day represents how many devices have received a new "red risk" warning on that day, meaning that on the previous day¹, the warning was not "red risk". So the sum of these numbers may well exceed the number of CWA users. I would only be concerned if the reported number of a single day exceeded the number of CWA users.

Anyway, we'll also double-check this along with the 19 individuals and get back to you!

¹ technically, it's not "previous day" but "previous time the device reported PPA data". Given the large number of devices that report PPA data and due to several retry-mechanisms on the same day if reporting PPA data failed the first time on a given day, we approximate it to "previous day".

Mtagxx commented 2 years ago

Hi @mlenkeit,

Assuming that there are about 30 million CWA users and half of them participate in data donation then there are about 15 millions who report data. During March 2022 there were 26 million red warnings reported by data donors, which would mean that every app showed a red warning 1.7 times on average during March. During April 2022 there were 17.5 million red warnings reported. So again every app show a red warning this month again. And so on ... That doesn't look right to me. So thanks, if you'll look into that!

mlenkeit commented 2 years ago

Hi @Mtagxx, the "19 individuals" figure is still being checked but we can already provide an update on how exactly the "red risk" warnings are calculated:

Semantically, it represents devices that have received a new "red risk" warning that day. That's devices where the risk level is reported as "red risk" (and displayed as a red card) and the date that's displayed on the card (i.e. the most recent date) is newer compared to the previous time the device sent data (usually the previous day, see previous post from me for details). This is reported by the device as most recent date changed (yes/no).

Please find an example and some more technical details below.

In March 2022, the 7-day-incidence reached an all-time high in Germany with figures reported close to 2.000 (cf. https://www.rki.de/pandemieradar). Accordingly, we saw spikes in CWA in the "Sharing behavior" figure and in the "Warnings received by data donors" figure as you can see in the CWA Dashboard, as well as in the number of Diagnosis Keys that were being shared and published to warn other users. In April 2022, the 7-day-incidence was declining but still at a rather high level.

I hope this helps to explain the numbers that you see in the CWA Dashboard.

If not, could you please elaborate why the ~26 million "red risk" warnings for March and ~17.5 million for April don't "look right" to you?

Example

Let's take a look at an example and assume we look at a timeframe of Day 1 to Day 5.

Device A
- Day 1 - CWA shows "green risk" card without encounter (i.e. no warning)
  
  The device reports risk_level=green and most_recent_date_changed=no.
- Day 2 - CWA shows a "red risk" card (i.e. warning) with the most recent encounter on Day 1.
  
  The device reports risk_level=red and most_recent_date_changed=yes. ➡ counted
- Day 3 to 5 - CWA continues to show the "red risk" card from Day 2.
  
  The device reports risk_level=red and most_recent_date_changed=no on each of these days.
This device would be counted towards "Warnings received by data donors" with "red risk" only on Day 2
Device B
- Day 1 - CWA shows "green risk" card without encounter (i.e. no warning)
  
  The device reports risk_level=green and most_recent_date_changed=no.
- Day 2 - CWA shows a "red risk" card (i.e. warning) with the most recent encounter on Day 1.
  
  The device reports risk_level=red and most_recent_date_changed=yes. ➡ counted
- Day 3 - CWA continues to show the "red risk" card from Day 2.
  
  The device reports risk_level=red and most_recent_date_changed=no.
- Day 4 - CWA shows a "red risk" card (i.e. warning) with the most recent encounter on Day 3.
  
  The device reports risk_level=red and most_recent_date_changed=yes. ➡ counted
- Day 5 - CWA continues to show the "red risk" card from Day 4.
  
  Behind the scenes, it received another warning about an encounter on Day 2. This is displayed in the Contact Journal, but not on the home screen, because although it's a new warning, it refers to a date before the most recent "high risk" encounter (i.e. "red").
  
  The device reports risk_level=red and most_recent_date_changed=no.
This device would be counted towards "Warnings received by data donors" with "red risk" on Day 2 and on Day 4.

In total, the statistics for "Warnings received by data donors" would show a count of "red risk" reported for Day 2 as 2 for for Day 4 as 1. On the other days, it would be zero.

Technical Details

In case you're interested, the database query for calculating the "red risk" warnings is the following SQL statement:

SELECT submitted_at, count(*) as "red_risk_count" FROM exposure_risk_metadata WHERE risk_level = 3 and "most_recent_date_changed" ='t' GROUP BY submitted_at ORDER BY submitted_at;

Mtagxx commented 2 years ago

Hi @mlenkeit, thank you for your elaborate answers which answers all of my questions! Your example "Device B" shows, that the number of "red warnings" can be higher than the number of warned individuals and that there is no obvious relation between both numbers. That's what I wanted to remark to Science-Blog-5.

mlenkeit commented 2 years ago

@Mtagxx I’m glad to hear that the examples helped to explain the numbers. With this information, do you still have reason to assume that the number of 19 individuals who receive a red warning is too high and “must be smaller”? If yes, could you elaborate why? If not, you can close this issue with the Close issue button below.

Mtagxx commented 1 year ago

@mlenkeit, well I guess I don't quite understand your question. The assumption in Science-Blog-5 is, that the number of red warnings equals the number of individuals with red warning. In my opinion this is not quite correct. In your example you showed how 2 devices/individuals produce a total of 3 red warnings, which is a factor of 1.5. I suspected that this factor is in reality something like 3 or 4. As long as in reality the factor is small, let's say below 2, than it is not worth to spend much more time on discussing this topic any further and you just may close this issue.

MikeMcC399 commented 1 year ago

@Mtagxx

You might also be interested in Was sagt eine rote Warnung bei hohen Inzidenzen noch aus? published today, Nov 30, 2022.

Mtagxx commented 1 year ago

Hi @MikeMcC399, very interesting! Thanks for the hint!

MikeMcC399 commented 1 year ago

@Mtagxx

Thanks for the hint!

You are welcome!

Would you like to consider closing this issue? It seems like the discussion is now finished or do you think that the article "Science Blog 5" needs to be changed?

larswmh commented 1 year ago

Closing this issue now as the questions have been answered and there has been no further activity. Thanks to everyone for participating.

corona-warn-app / cwa-website