compdemocracy / polis

:milky_way: Open Source AI for large scale open ended feedback
https://pol.is
GNU Affero General Public License v3.0
772 stars 178 forks source link

Allow deletion/removal of own user data #746

Open patcon opened 3 years ago

patcon commented 3 years ago

Reticketed from 2021-01-09 call, in response to questions from Angie (@dataHumanist)

Angie was surprised there was no way to remove/delete her own data. She would expect that her participants would have control to revoke their data and have it deleted from the system.

patcon commented 3 years ago

@colinmegill speaking of which, does this request dovetail with meeting GDPR compliance?

colinmegill commented 3 years ago
  1. My present understanding of GDPR is that whoever hosts Polis is obligated, if operating in a context in which GDPR is relevant law, to respond to data removal requests.
  2. No, deletion is not automated. It's manual and there is a checklist.
    • Core should share the checklist we use on hosted, for others who are hosting their own instances
    • There can be manual data exports. The data may no longer be hosted, but the data is still around.
    • There can be downstream reports on which algos were run, which are PDF'd. The automated report may not be there, but reports may include algos that were run on all the data. Ie., downstream artifacts may include aggregated data.
  3. Since Polis is OSS, our emphasis and recommendation for people who want data control is that they host the software on their own metal server, and use pol.is/home SaaS to test out/pilot whether the tech is relevant to their use cases using less critical domains on fluff topics.
colinmegill commented 3 years ago

@metasoarous can confirm, but I believe automated delete would be more like automated overwrite records because of potential indexing errors that would result from computation steps.

patcon commented 3 years ago

One more thing I forgot: Angie understands that the ability to remove data is required for her to use the tool in graduate research (this is coming up from her Research Advisory Board interactions). Not that this decision depends on one person's needs, but just to add that texture :)

colinmegill commented 3 years ago

Ah! That's very helpful. I should have mentioned above, regarding a checklist, that Polis is designed for anonymous participation. It's not straightforward (or, perhaps, always possible) that the hoster will be able to confirm that the participant record is connected to whoever sends an email. Care must be taken if someone logs in with twitter, and that person has the same name or handle in their email, that these are in fact the same individual in the context of a take-down. In that sense, it would be simpler if everyone were logged into the system — it's just not how the system operates.

colinmegill commented 3 years ago

Angie could use xid for all participants if they are anon, and then if there was a notice, the hosting body (us or others) could zero out the record.

colinmegill commented 3 years ago

Oh one more thought :) we DO attach the cookie to the user record if they log in. So, if they haven't cleared cookies they can create an account and their conversations will be attached to that record.

We used to have a conversations I have participated in view, but there are a number of reasons we dropped this. That would obviously be useful here, and a lack of utility was why it was dropped.

ThenWho commented 3 years ago

GDPR also mandates that orgs need to have a data retention policy, i.e. there must always be a time limit.

There is a loophole there, that allows to skip that if data is anonymized, but if moderators opt to allow twitter/facebook logins, I think there must be some default or selectable time limit too, e.g. 1, 3, or 5 years.

Maybe it would be good to think of it in conjunction with #524 .

PS: I think the recommendation for people to use own servers will not fly, if it comes to that. Since polis allows people to host convos, there needs to be a way to delete data. Manually, after communication with the polis team is perfectly ok though.

dataHumanist commented 3 years ago

thank you @patcon @colinmegill , this helps immensely. I don't imagine people who were anonymous will come back to ask for their data to be removed. But to comply with the Research Ethics Board approval, knowing that there is a manual process that exists to delete data should satisfy this requirement.

patcon commented 3 years ago

Glad it's helpful, Angie!

Core should share the checklist we use on hosted, for others who are hosting their own instances

@colinmegill perhaps this a good thing we could do in something like the hackScript/ folder of helper scripts that PDIS maintained in their fork: https://github.com/PDIS/polisServer-archived/tree/local-polis/hackScript

patcon commented 3 years ago

So, if they haven't cleared cookies they can create an account and their conversations will be attached to that record.

@colinmegill I've always wondered if that applies on both client-participation and client-admin during account creation (after participation). Can you clarify? (Also something I could add an end-to-end test for, now that I think about it)

patcon commented 3 years ago

@colinmegill mentioned in https://github.com/compdemocracy/polis/issues/746#issuecomment-757533843:

Core should share the checklist we use on hosted, for others who are hosting their own instances

Could this rough checklist be shared somewhere, perhaps just dropped into docs/? Or just link to a good doc and I can bring it in :) Whatever's simplest