janagombitova commented 4 years ago

Context

As partners are more and more thinking about best data practices and trying to limit the amount of personal data they collect (due to GDPR) we see an increase in questions around How can I know which data columns (questions) hold personal data?. Also, many Terms of Reference for proposals hold requirements like: identification of personal data, distraction of personal data from the dataset.

Today there is no way to know which column holds personal data besides looking at the question text (column header) and the data itself. But often, already having to look at the data is something partners want to avoid, especially when data is being exported out of Flumen (as Flumen handles data access with good roles and permissions).

Why do we have this issue? What are we trying to solve?

There is a lot of work that can be done to make it easier to work/not work with personal data. But our main goal is to allow users to work with Flumen without changing too much. We want to make it possible to identify personal data once the data is out of Flow via the API, so they can hide/mask/remove the data based on their needs.

Why not more?

Because we do not know what exactly the reasons are for hiding/masking such columns in the dataset. Will they analyse the data without these columns? Why collect them in the first place? Who should not see this data and why should this person see the rest of the dataset?

Goal

The goal of this change is to implement the minimum, so we can learn more and then build up on what we learn.

How will this benefit the users?

Make it easy to identify personal data
Make it easy to mask personal data once taken out of Flow via the API in the external system

How will this benefit Akvo?

Show we take personal data protection seriously
Market advantage as such features are not yet spread across tools
Easy to apply to partner calls for funding, as we can fulfil the data protection requirements
Our TC team will be able to build custom data solutions where personal data is masked from the view

Current status quo

Currently, there is no way to say this question will hold personal data.

Opportunity

There are many opportunities underlying this issue:

define which question holds personal data
automagically detect personal data
allow to export data without these columns
allow user to have access to a data set but not to the specific columns
give an overview of all personal data (volume)
etc...

But we will ONLY go for the minimum for now to learn.

The Idea = The minimum

To allow the user to specify that this question holds personal data
Expose this flag in the survey form xml and in the API so those working with the data outside of Flow can make use of it
The changes should NOT affect how the Flow app works

Next steps  

muloem commented 4 years ago

@janagombitova Do you want to write up some tooltip helper text to show?

janagombitova commented 4 years ago

@muloem if it means that we also fix the bug around the tooltips, then yes :)

muloem commented 4 years ago

ok. put together the text and I will bring up that branch that I had fixing the tooltip 👍

tangrammer commented 4 years ago

Expose this flag in the API https://github.com/akvo/akvo-flow-api/pull/223

janagombitova commented 4 years ago

@muloem how about this Tooltip: Define that answers to this question are personal data (personally identifiable information - any data that can be used to identify a specific individual). When exporting the data via the Flow API, you can filter out the values based on this flag.

janagombitova commented 4 years ago

@marvinkome and @tangrammer is there more that needs to get done here? I am moving the tooltip to a separate issue (https://github.com/akvo/akvo-flow/issues/3593) that can be handled later. If besides that, all is done, let's release the change.

marvinkome commented 4 years ago

Nothing from the UI side

janagombitova commented 4 years ago

Released on June 2

akvo / akvo-flow

Mark a question as personal data #3576

Context

Why do we have this issue? What are we trying to solve?

Why not more?

Goal

How will this benefit the users?

How will this benefit Akvo?

Current status quo

Opportunity

The Idea = The minimum

Next steps