DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

what data is needed for digital proximity tracing? #225

Open peterkwells opened 4 years ago

peterkwells commented 4 years ago

Hello

I was reading the paper on what data is needed for digital proximity tracing and saw that question 2 "what data could such a system collect that would help epidemiologists understand...." was not explored beyond the basic function of the system.

A group in Australia have published a paper on evaluating contact tracing in a pandemic which states an epidemiological want/need for demographic information about people that an infected person has been in contact with to help understand effects of the disease and risk factors.

It may be worthwhile to reference the Australian work, whether that want/need is fair, and how it could be met while aligning with DP-3T's other design principles. I can imagine ways of doing it both within any app as well as in the broader contact tracing system that the app sits within.

lbarman commented 4 years ago

Thanks @peterkwells for the useful reference, forwarding internally...

peterboncz commented 4 years ago

In certain countries, the health authorities might want to build an infection graph (who infected who). Certainly the manual contact-tracing process allows them to do this, so they will be used to getting that. Now, DP^3T is designed to not allow that, and should not.

A privacy-preserving statistic that could be revealed to the DP^3T backend (not to any system of the health authorities that covers testing and patient data) anonymously when registering your SK's is how many warnings you received before testing positive. Note you could split warnings into forwards and backwards tracing (see #242), although that would require an extension of the current protocol. You would also want to provide the dates of the warning(s) -- this will allow to monitor time-in-testing-pipeline, and thus the efficiency of the contact-tracing-app and the national testing infrastructure. Also, one could have the app ask for any symptoms and the date of first symptom. Providing such statistics should of course be opt-in only -- even while (I think) such info, because not personally relatable, is not privacy sensitive.

Similarly, learning to better tune the risk-scores could also be powered by anonymous reporting of features from contact logs to the backend. You would report a table of features (distance, RSSI, etc) without EphIDs but just with a target-variable (true/false) that is true for the logged events that matched a warning. With as additional features your phone model and possibly its calibrated Tx power. The backend could use some machine learning to learn a good risk function from these features and target variable.

Computing the risk-score (see #235) with a simple expression evaluator in the app would allow to change the function without needing an app update (would start with a simple linear model: that is just as simple as using variables for additive-offsets and multiplicative-coefficients for each feature). When syncing with the backend- the apps can then also get new risk-score parameters and any red-flag-level danger threshold.

In general, tuning parameters (such as time-windows to retain data, sync frequency with backend, the risk-function-parameters and threshold as described above, and probably more) should preferably be re-read from the backend on each sync. This allows to better orchestrate and adapt the way the apps behave, without requiring an app-update.