Collect strategies to protect privacy when displaying intermediate results - Githubissues

alan-turing-institute / teadt-community-survey

This is a repository for the TRIC-DT Innovation and impact hub community survey, forming part of the scoping research for the TEA-DT project

0 stars 1 forks source link

Collect strategies to protect privacy when displaying intermediate results #7

Closed aranas closed 6 months ago

aranas commented 7 months ago

How can we avoid re-identification of participants through the dynamic display of results?

k-anonymity (only show results in sufficient grouping, such that no data point appears isolated)
- if 3-anonymity not met, we don't display the plot with warning around privacy
differential privacy (adding noise to the displayed data)

How easy is the latter to implement? Can we give this a try for the initial pie plots and sankey charts?

Step 1: Scoping possibilities

kallewesterling commented 7 months ago

I don't know much about the topic, but it seems a lot easier to stick with k-anonymity here rather than trying to add noise to displayed data? Esp since we're on a short timeline -- keep it simple would be my point of view :)

aranas commented 6 months ago

In the end, the visualisations / feedback we give is on the level of averages and will not allow for re-indentification or reveal any sensitive information about survey participants, so I will close this comment as there is no further need for it.

cptanalatriste commented 6 months ago

@aranas , I would encourage to have this checked by a privacy specialist. I'm aware you can leak by publishing raw averages, considering users can access the plots many times (as there's no authentication)

aranas commented 6 months ago

Good point Carlos! I believe since we don't collect any sensitive data in the survey, it is less critical but happy to discuss this further. We can consult Kit for example. Let's discuss strategies on Monday!

aranas commented 6 months ago

I am pushing a new version with the plots to azure right now and will notify when it is up, you will see that the visualisations only cover a very limited subset of the data which was a deliberate choice to help with this issue. I think we would need to map out exactly the danger that still exists re leakage

cptanalatriste commented 6 months ago

One more thing: Are we comfortable with the plots influencing the participant's answer? I'm imagining a participant that selects the "I don't know" option, goes sees in the plot that most people "know", and goes back and changes their answer.

aranas commented 6 months ago

One more thing: Are we comfortable with the plots influencing the participant's answer? I'm imagining a participant that selects the "I don't know" option, goes sees in the plot that most people "know", and goes back and changes their answer.

One easy way to circumvent this is to give people the results only at the very end after submitting, this is definitely an option! The very first display comes early and tells them something about how many similar 'profiles' have filled in this survey, this display will not have any influence on their answers, the following two could have an influence, although I don't really see the incentive since the db is anonymized anyways.

aranas commented 6 months ago

option:

display callout to notify user of the risk of some data being displayed

aranas commented 6 months ago

After reevaluating the actual plots shown, the team decided there will be no need for additional privacy measures