Closed aranas closed 6 months ago
I don't know much about the topic, but it seems a lot easier to stick with k-anonymity here rather than trying to add noise to displayed data? Esp since we're on a short timeline -- keep it simple would be my point of view :)
In the end, the visualisations / feedback we give is on the level of averages and will not allow for re-indentification or reveal any sensitive information about survey participants, so I will close this comment as there is no further need for it.
@aranas , I would encourage to have this checked by a privacy specialist. I'm aware you can leak by publishing raw averages, considering users can access the plots many times (as there's no authentication)
Good point Carlos! I believe since we don't collect any sensitive data in the survey, it is less critical but happy to discuss this further. We can consult Kit for example. Let's discuss strategies on Monday!
I am pushing a new version with the plots to azure right now and will notify when it is up, you will see that the visualisations only cover a very limited subset of the data which was a deliberate choice to help with this issue. I think we would need to map out exactly the danger that still exists re leakage
One more thing: Are we comfortable with the plots influencing the participant's answer? I'm imagining a participant that selects the "I don't know" option, goes sees in the plot that most people "know", and goes back and changes their answer.
One more thing: Are we comfortable with the plots influencing the participant's answer? I'm imagining a participant that selects the "I don't know" option, goes sees in the plot that most people "know", and goes back and changes their answer.
One easy way to circumvent this is to give people the results only at the very end after submitting, this is definitely an option! The very first display comes early and tells them something about how many similar 'profiles' have filled in this survey, this display will not have any influence on their answers, the following two could have an influence, although I don't really see the incentive since the db is anonymized anyways.
option:
display callout to notify user of the risk of some data being displayed
After reevaluating the actual plots shown, the team decided there will be no need for additional privacy measures
How can we avoid re-identification of participants through the dynamic display of results?
How easy is the latter to implement? Can we give this a try for the initial pie plots and sankey charts?
Step 1: Scoping possibilities