cph-cachet / carp.core-kotlin

Infrastructure-agnostic framework for distributed data collection.
https://carp.cachet.dk/core/
MIT License
21 stars 3 forks source link

Personal identifiable information (PII) in the deployment subsystem #464

Open Whathecode opened 9 months ago

Whathecode commented 9 months ago

Overall, the deployments subsystem does a good job of not including PII in its subsystem. The link to an actual account which allows for re-identification happens in the study subsystem. A potential aim could be that a claim can be made the deployments subsystem does not store PII, and hosting can thus be outsourced to a third-party with less legally binding requirements.

However, there is data stored in the deployment subsystem which can carry PII. Concretely:

The latter issue (DeviceRegistration) may not be a big issue when considering the scope of people to consider during reidentification. It may be easy to get a MAC address of a specific individual you are targeting (simple BLE scanning), but it's not trivial to scan every potential person who may have data in the deployments subsystem. More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

Whathecode commented 1 month ago

More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

From this, I currently conclude that the device registration for such a device shouldn't contain the username of the account, and instead rely on a UUID (so probably just DefaultDeviceRegistration), and handle the linking of that id to a setup enabling authentication in the application/infrastructure layer (outside of core). Or, it could store a token instead.

Either way, the general point is that care needs to be taken when designing new DeviceRegistration types to adhere to the data minimization principle, and reduce the risk at a minimum of direct identification of individuals for any data stored in the deployment subsystem.

But, @bardram, I believe the following conclusion in still spot on:

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

It seems likely enough that some data stored in the deployment subsystem would be classified as PII, so stuff like GDPR kicks in. Without having the application layer fully handle encryption of said data, that would make the deployments subsystem in CARP core a data processor.

But, none of this is a concern for the current release, and this subsystem isn't deployed separately yet either way (other subsystems, like studies obviously will always have PII), so we can consider this a theoretical exercise until the point when this becomes an actual requirement. Therefore, I'll remove resolving this from the next milestone.