corona-warn-app / cwa-documentation

Project overview, general documentation, and white papers. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
3.28k stars 344 forks source link

Diagnosis Keys Upload: Reference day for TRL offset computation is upload date and not onset date #406

Closed crossow closed 3 years ago

crossow commented 4 years ago

The "Epidemiological Motivation of the Transmission Risk Level" document describes the transmission risks for infected persons relative to the onset of symptoms. It served as a baseline to define the TRLs. However, the app does not attempt to infer the onset date, and instead, the risk exposure computations use the day of upload as reference day. Any difference between onset date and day of upload thus leads to wrong TRLs being picked for the uploaded keys.

Example: A user shows symptoms on Friday, undergoes a test on Monday, and receives and uploads their positive result on Tuesday. This results in an offset of 4 days. Contacts the user had 6 days before submission (i.e., on Wednesday, two days before symptoms) would receive a low TRL of 3, even though the user was highly contagious at that time.

To solve this issue, the DP-3T consortium recommends to infer the onset date:

If the test result is positive, it is important to establish when the patient's contagious period began. [...] The health official notifies the patient of the result (Step 4, Figure CP). [...] We assume that this message includes the test result (positive or negative) and if positive, other  supplementary information such as a request to contact the health official to discuss  their probable onset date, or advice on how the patient could determine this themselves.

This issue came up during a security analysis of improving privacy against traffic analysis. Kudos to Timo Renner (SAP), Maik Mueller (SAP) and Cas Cremers (CISPA).

-- Prof. Dr. Christian Rossow | Faculty CISPA Helmholtz Center for Information Security Stuhlsatzenhaus 5, Saarland Informatics Campus 66123 Saarbrücken, Germany Mail: lastname [at] cispa [dot] saarland | Web: https://cispa.saarland/group/rossow/


Internal Tracking ID: EXPOSUREAPP-2192

daimpi commented 4 years ago

This issue seems to affect both Android and iOS, so it probably should be moved to the documentation repo :)

Somewhat related: in addition to the problem mentioned here (but somewhat less severe) the current implementation is also applying TRL profiles based on list order, not actual TEK validity: https://github.com/corona-warn-app/cwa-documentation/issues/343

ghost commented 4 years ago

Hello @crossow and @daimpi,

I have informed our development team to take over this issue.

Thanks, LMM

Corona-Warn-App Open Source Team

mhoehle commented 4 years ago

I have difficulties understanding the issue. The mentioned "Epidemiological Motivation of the Transmission Risk Level" document sketches a sequence of events (exposure, symptom onset (not always observed), test, information about test result, upload. The TRLs are computed based on upload date as clearly shown in Fig. 14 of the document.

However, the exact transmission score based on the upload day is motivated by epidemiological understanding (State of the information: early June 2020) about the time delay from exposure to upload. As part of this analysis one averages over four different scenarios about possible knowledge about symptom onset in order to match the current situation of the CWA app that NO information about symptom onset is available - only upload date.

The particular scenario involving symptom onset date could be interesting in its own right, because this could be a situation which might happen when local health authorities do contact tracing.

Would it be possible to clarify the issue somewhat?

crossow commented 4 years ago

Thanks for your comments.

@hoehleatsu The issue I see is that (i) the app does not ask the users to enter their onset date (if any), and (ii) the app allows uploads to happen quite late (even after days). The resulting need to estimate these operational delays, as sketched in Section 3.2 of the document, risks that the assumed distributions do not necessarily reflect the situation in practice. The document also acknowledges this and suggests that the distributions "have to be adjusted based on real data once the system is running". I presume this is still to be done.

So my main question: Why doesn't the CWA app follow the suggestion of DP-3T and aims to infer the onset date? This would allow for a far more accurate exposure risk computation.

mhoehle commented 4 years ago

Thanks for the clarification, that was helpful.

So my main question: Why doesn't the CWA app follow the suggestion of DP-3T and aims to infer the onset date? This would allow for a far more accurate exposure risk computation.

The structure with the four cases shows that this is already been thought of as part of the development process and I would guess that the use of onset date is something on the wishlist - so it's likely resources and priority which decide. The product owners can probably say more about this.

From a scientific viewpoint: One thing which might be helpful is to perform simulations in the spiriti of Ferreti et al (2020), that show how much in terms of can be gained by having extra precision about the day of onset. Maybe this could be a project that the scientific community could contribute with - possibly this can help set the priority accordingly. Two further points are 1) what does "onset" for COVID-19 really mean? Here it would be important to have a clear and user-communicable definition which captures that this underlies variability 2) how robust would the scoring be to mis-specifications of the DSO, because it appears to have some variability in cases and could be interpreted by the user in the wrong way. Other apps simply use a TRL constant over a 14 day window in order to "avoid" any assumptions and because sensitivity is prioritised over specificity. 3) are there any data protection issues when asking the user for onset date? Do any of the DSO statements need to be extended accordingly? (it appears that you are an expert on this area)

mh- commented 4 years ago

So my main question: Why doesn't the CWA app follow the suggestion of DP-3T and aims to infer the onset date?

@crossow Why are you using the term "infer"? As far as I understand, the "Epidemiological Motivation of the Transmission Risk Level" was already trying to infer the onset date (from the upload date, using assumed statistical distributions).

An improvement would be to ask for the onset date, right? Or from which other information could the app infer the onset date?

Anyway, this part definitely should be clarified, latest when migrating to API v1.5. The new ExposureWindow mode seems to - at least initially - not make use of days_since_onset_of_symptoms.

crossow commented 4 years ago

@mh- Yes, asking is what I meant. However, as the answer might not always be trivial (e.g., as patients don't know which symptoms count, patients lack symptoms, etc.), I chose another word. Maybe determine would have been more accurate.

In my eyes, ideally, to get rid of modeling inaccuraries, the app should ask (right before uploading the TEKs) if users have symptoms, and since when. This question could be made optional to avoid that it deters users from uploading.

daimpi commented 3 years ago

@crossow did the recent implementation of "Symptom Recording" which gives positive tested ppl the option to enter a date for symptom onset, address your concerns? (blog entry)

If so feel free to close this issue 🙂.

crossow commented 3 years ago

Thanks for the heads-up. Yes, this solves the problem, in particular the computation on the risk level based on onset date.

Good job everyone! Closing.