BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.03k stars 448 forks source link

EU-GDPR - Right to Erasure #2447

Closed TheAspens closed 6 years ago

TheAspens commented 6 years ago

Another aspect of the GDPR law is the Right to Erasure. I've created a proposed implementation that might meet the requires of this provision of the law. This continues the work outlined in #2332 and #2413.

The proposal is documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure

I would appreciate a review of this and feedback on the implementation. In particular, I would like feeback from people who are doing their own compliance work to review if this is likely to be the minimum steps necessary to comply with this provision of the law or if some lessor action (like scrubbing user fields such as email address, name, ip address etc) would be permitted. In particular I would appreciate @lfield and @brevilo to take a look and provide feedback.

TheAspens commented 6 years ago

For example - I would like the opinion of @brevilo and @lfield if this implementation is sufficient: #2445

SETIguy commented 6 years ago

One major question...

What's the definition of "all of their data"? According to the page links, it appears that "personal data" is what is covered. Which fields in the user, host, thread, post and result tables belong to a user? Is a userid "their data" once any link to an email_address or cpid has been removed? I would tend say no, yet I would also tend to think that a CPID is "their data" as is directly identifies a user across projects. Similarly, in host I would expect that ip_addr, external_ip_addr and domain_name belong to the users, but nothing else is personal or user owned information. Most of that information is created by the project for internal use. There may be projects which require access to other information in the host table. host_app_version is also, IMHO, not information that belongs to a user, although its not much use to the project once a user has left. Posts and threads, I can see that deleting them all is probably required.

Then there's science data. If host.m_cache belongs to the user, doesn't that also mean any science results returned are the property of the users and need to be deleted as well? After all they link back to result.userid.

I think this proposal goes way too far. To delete all of the personal data for a user...

  1. randomize the personal fields (name, email address, cpid, url, etc.) in user, forum_preferences 1a. Delete any profile images.
  2. delete all threads and posts for the user.
  3. randomize the IP addresses for the user's hosts.

At that point, all the personal information is gone, unless a project app is sniffing and storing personal information. No deleted tag is necessary, dump the randomized strings.

On Tue, Apr 3, 2018 at 3:09 PM, Kevin Reed notifications@github.com wrote:

For example - I would like the opinion of @brevilo https://github.com/brevilo and @lfield https://github.com/lfield if this implementation is sufficient: #2445 https://github.com/BOINC/boinc/pull/2445

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-378415529, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsniFrxQl5G-sc0hsTrV2iQ9FS4JOks5tk_M0gaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

brevilo commented 6 years ago

@SETIguy you're questions are all relevant and I think I overall agree with your assessments. The problem is, we don't really know for sure. The GDPR isn't fully fleshed out (as most legal text) and certain questions can only be answered after the first legal/court cases got settled. This is particularly true since the ePrivacy regulation was meant to become effective in parallel to the GDPR but won't until 2019.

Which fields in the user, host, thread, post and result tables belong to a user?

Any data that relates to an identifiable (directly or pseudonymized) data subject (e.g. via userid reference). Break the relation or anonymize the data subject and you might be good.

Also, keep in mind that these data are affected by the data subject's right to "data portability" as well. You need to be prepared to hand those data over on request (within a month), in a "structured, commonly used and machine-readable format".

Is a userid "their data" once any link to an email_address or cpid has been removed?

If the userid can be somehow still be associated with the data subject, it would just be pseudonymized data and the GDPR also applies to those. Think of externally available information like search engine caches or BOINC account managers or stats sites for instance. Those, by the way, are yet another can of worms (sign-up/consent, erasure notifications)...

Most of that information is created by the project for internal use. [...] There may be projects which require access to other information in the host table

Most of this boils down the question of the lawfulness of the data processing you do. This can be established via different means, the two directly applicable ones in our case should be the "data subject's consent" and "data controller's legitimate interest". The latter can override the former if justified but it's of course much easier to describe your data deletion/retention policy in your privacy policy and include it in what the data subject gives its consent to. Most importantly: whatever you do, do it transparently and document it in your "records of processing activities" (another mandatory GDPR requirement).

Posts and threads, I can see that deleting them all is probably required.

Yes, but that's harder than it sounds. What do you do with threads opened by the data subject to be deleted? What do you do with quotes of the data subject's comments? Again, there could be "legitimate interest" to retain those as the discussion would lose coherence (i.e. "for archiving purposes in the public interest, scientific or historical research purposes"), but that's all not entirely clear yet (according to our data protection officer).

any science results returned are the property of the users and need to be deleted as well? After all they link back to result.userid.

This is not about property but data subject rights pertaining to the data subject's data. As soon as you anonymize the tasks/results, e.g. by NULLing the userid you might be safe. Related to the idea of data property is that the data controller should be allowed to delete any of the data subject's data without prior consent as the data subject doesn't own it.

At that point, all the personal information is gone

That might be true but keep in mind external data (see above) that might still allow to derive the original data subject (e.g. via the userid pseudonym).

If in doubt, delete whatever you can. That's probably what we're going to do anyway - cleans/speeds up the DB as a nice side-effect.

HTH

RichardHaselgrove commented 6 years ago

Couple of quick points. When a user was recently deleted (user request), several of us noticed that their private messages were deleted from our inboxes. In a recent check of dates/ID numbers at SETI, I was surprised to find that BOINC users appear to write roughly the same number of private messages to each other, as public messages on the message boards. Whatever David did in that case (probably related to #2445, rather than the GDPR) needs to be included in this discussion too.

And what effect will the GDPR have on the "Wayback Machine" internet archiving project? I sometimes refer to that to check on the previous history of a BOINC project.

brevilo commented 6 years ago

@RichardHaselgrove we're going to delete private messages.

And what effect will the GDPR have on the "Wayback Machine" internet archiving project?

That's part of the "erasure notifications" (GDPR Art. 17.2) issue as well as the lawful processing for "archiving purposes in the public interest, scientific or historical research purposes" (GDPR Art. 17.3d) I alluded to above. The former might affects projects (not yet clear) but the latter only affects the Internet Archive itself.

brevilo commented 6 years ago

@TheAspens I'm in the process of reviewing the proposal. Whatever get's done: please separate frontend (.php) from backend/library (.inc) code such that any of these features can easily be integrated with the Drupal code.

Thanks

brevilo commented 6 years ago

@TheAspens My comments on the proposal:

Ageless93 commented 6 years ago

I'm wondering though.

Private messages that I received, are as far as I see it mine, no longer owned by the sender. So when the sender wants his account erased, the PMs I got from him have to be left alone, as they're no longer his, but mine. Perhaps if the project has an outbox, that sent PMs in there have to be removed. But most projects just have an inbox and a write PM option.

Compare it to text messages, Whatsapp, snail mail. Once the sender sent it, it's no longer his. When he wants his account deleted at a service provider, they won't delete all the text messages he ever sent from other people's devices. When the person stops Whatsapp, only the local account is deleted, but all sent messages will still be on other people's devices. When you mail a handwritten letter to some other person, it's no longer yours as soon as you drop it in the mailbox.

So why handle private messages differently?

Willy0611 commented 6 years ago

Here are my thoughts regarding both BOINCstats and BAM!:

For BOINCstats (the stats section) it's enough to just remove the user/hosts from the XML export. During the next import users/hosts no longer existing in the XML will be deleted from the stats. Other stats sites may work differently.

BAM! is a little bit more complicated and may also require more to be done on the project side.

When a user deletes his account at a project, should that also delete his BAM! data for that project (please keep in mind that BAM! data is not stats data!)? The project doesn't necessarily know that a BAM! account with data for that project exists. If this data should also be deleted, the project should call a (non-existing) BAM! API to do so.

Then the other way around: When a BAM! user deletes his account, should it also delete all the linked project accounts? I think this should be a choice by the user. If he chooses yes, BAM! must call a project API (RPC) to notify the project to do so. Then the project can do one of two thing: A) Trust BAM! and delete the account or B) start the deletion process as outlined here.

And lastly, the big issue: Sometimes I get requests to remove stats data. Most of the time these emails contain a link to one or more pages on BOINCstats with the request to remove them. It's impossible for me to be 100% sure that the person requesting the deletion is the true owner of that data. It can also be someone trying to get some competition out of the way. I refer these people to the project sites to delete/anonymize their account there. This only works when the project is still up and the admins responding. So far I have refused all requests to delete stats data on my side, however, this may lead to some issues with these new rules. I'm not sure how to handle this.

TheAspens commented 6 years ago

@brevilo

Data exports:

  • I think we need to separate exports required for data portability and exports for downstream consumers
  • The former need to be augmented to include all data of the data subject, inkl. host details and community content
  • The latter only need to the deletion request/tag (nice idea by the way!)

I agree that data portability needs to be seperate from the data exports. My proposal does not address the data portabilty requirement and that will need to be addressed in a seperate issue.

brevilo commented 6 years ago

Private messages that I received, are as far as I see it mine

@Ageless93 I doubt that. I don't think the data subject has ownership on any kind of data by default, let alone on data provided by others. The controller provides a service and unless otherwise stated (by the contractual basis, e.g the terms of use) can legally remove any such data.

Compare it to text messages, Whatsapp, snail mail. Once the sender sent it, it's no longer his

In that case you might have a physical (or cached) copy but even that doesn't constitute ownership. If all messages were server-based, which they are in BOINC, the service provider (controller) can simply choose to shut down the service immediately, without your consent.

Regarding WhatsApp: have you read the terms of use you agreed to? Would be interesting to know what they say on data ownership.

brevilo commented 6 years ago

@willy0611

When a user deletes his account at a project, should that also delete his BAM! data for that project (please keep in mind that BAM! data is not stats data!)? The project doesn't necessarily know that a BAM! account with data for that project exists. If this data should also be deleted, the project should call a (non-existing) BAM! API to do so.

Projects are required to make sure any upstream/downstream services delete any of the published data as well (GDPR Art. 17.2). In case of BAM! the situation is more complicated, though, as the whole matter of opt-in consent to a given project's terms of use (or privacy policy) would shift to BAM! itself, according to our data protection officer. However, since we'd have to distinguish BAM! accounts from locally created accounts to have actual proof of consent, we might effectively be forced to shut down BAM! support until we have a GPDR-compliant end-to-end solution to this. The same might be true for the stats exports...

So far I have refused all requests to delete stats data on my side, however, this may lead to some issues with these new rules. I'm not sure how to handle this.

You have to get consent for that data processing already, even if it's dealing with pseudonyms only. That means you already have a bigger challenge at your hands, not just for data removal requests. We're all in the same situation.

TheAspens commented 6 years ago

@SETIguy - I am not a lawyer, my following statements have no legal weight so take them for 0 value.

As I have tried to understand how to comply with GDPR as it pertains to BOINC* I have come to the following understanding of the intent behind the law.

I believe that GDPR seeks to make information about an individual a fundemental right of that individual and that they get to control where that information is retained. This right supersedes any other agreements that they might of have entered into. Specifically, they can grant consent to a site to utilize data that they provide and that the site might collect about them. However, they also have the right revoke that consent and have the information they provided or was collected about them removed. They also have the right to review what information a system current retains about them.

This second bit is what makes this law such a new and fundementally different thing than what existed before. It means that we have to think of user data and assocaited data that we collect about them as something that is only loaned to us, but is not ours to keep. Systems will have to keep track of personal information and where it flows to ensure that if consent is withdrawn they can ensure that it can be removed.

Doesn't that also mean any science results returned are the property of the users
and need to be deleted as well?

Since the science results can be seperated from any notion of the user (i.e. when the result record is deleted from the database and after the result has been assimilated there is no longer any connection between the result and the user) and because they are part of the legitate purpose of the system seperate from the user, then GDPR does not apply to these records. If information about the user (for example the os it ran on and other such factors that might be needed to determine what happened during the execution of a particular task are retained) then it gets more complicated (I believe you still can, but you need to get into details about the lower levels of the law).

Ageless93 commented 6 years ago

@brevilo

Regarding WhatsApp: have you read the terms of use you agreed to? Would be interesting to know what they say on data ownership.

https://www.whatsapp.com/legal/ "Your messages are yours, and we can’t read them." "Your Rights. WhatsApp does not claim ownership of the information that you submit for your WhatsApp account or through our Services." "If you would like to manage, change, limit, or delete your information, we allow you to do that through the following tools: Deleting Your WhatsApp Account. You may delete your WhatsApp account at any time (including if you want to revoke your consent to our use of your information) using our in-app delete my account feature. When you delete your WhatsApp account, your undelivered messages are deleted from our servers as well as any of your other information we no longer need to operate and provide our Services. Be mindful that if you only delete our Services from your device without using our in-app delete my account feature, your information may be stored with us for a longer period. Please remember that when you delete your account, it does not affect the information other users have relating to you, such as their copy of the messages you sent them."

drshawnkwang commented 6 years ago

@TheAspens - I read through your RightToErasure document as well. Thanks for writing it up.

For the Drupal-BOINC implementation I have already written some code that deletes a user for the Drupal-side of the code. This was pre-GDPR (or before I learned of GDPR).

The user is presented with a 'delete account?' Web page a description of what will happen the account is deleted. If confirmed the account is flagged for deletion. There is no email confirmation. But the account is not deleted until two weeks later (adjustable by the admin). If the user logs in anytime within this two-week period, the delete action is canceled - i.e., the account is un-flagged.

After two weeks, the account is acted upon by a Drupal queue which deletes the Drupal user data, but keeps much of the data in the BOINC project database (tables: user, host, etc.).

There is no pressing reason for BOINC would have to implement a similar wait-period before deletion; this is just my $0.02.

RichardHaselgrove commented 6 years ago

(i.e. when the result record is deleted from the database and after the result has been assimilated there is no longer any connection between the result and the user)

I don't think that's necessarily true. Einstein (certainly) and I think SETI retain records of who processed which bit of the science - that's held in their master Science databases, long after the transactional processing records are purged from their BOINC databases. Einstein have - very publicly - awarded discovery certificates and named finders in press releases, and as (IIRC) co-authors in published scientific papers. That public recognition of participation will, of course, have been subject to secondary and very specific consent, far beyond any consent granted as part of the process of joining the BOINC project on day 1. But the user ID associated with the computation must have been maintained rigorously intact for the attribution to be possible.

JuhaSointusalo commented 6 years ago

FWIW, national data protection authorities have made guidelines about GDPR available though Article 29 Working Party. link

National authorities may have those translated or additional content available on their websites. link

SETIguy commented 6 years ago

Given that the data has been available for download by the general public, upstream/downstream deletion can't be guaranteed for anyone except well behaved upstream/downstream partners with resources. The guy who has been extracting and archiving data for all his team members to create graphs for his web site has been under no obligation to create a means for deleting data from his archive and probably will not do so. Do we need to stop providing public stats dumps? Which gets us back to the definition of "personal data". Are stats personal data to begin with?

And then there's gridcoin. A cpid/gridcoin address link beacon can't be deleted from the blockchain. I don't know if a username is stored with that or not. Probably not.

On Wed, Apr 4, 2018 at 7:36 AM, Oliver Bock notifications@github.com wrote:

@Willy0611 https://github.com/Willy0611

When a user deletes his account at a project, should that also delete his BAM! data for that project (please keep in mind that BAM! data is not stats data!)? The project doesn't necessarily know that a BAM! account with data for that project exists. If this data should also be deleted, the project should call a (non-existing) BAM! API to do so.

Projects are required to make sure any upstream/downstream services delete any of the published data as well (GDPR Art. 17.2). In case of BAM! the situation is more complicated, though, as the whole matter of opt-in consent to a given project's terms of use (or privacy policy) would shift to BAM! itself, according to our data protection officer. However, since we'd have to distinguish BAM! accounts from locally created accounts to have actual proof of consent, we might effectively be forced to shut down BAM! support until we have a GPDR-compliant end-to-end solution to this. The same might be true for the stats exports...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-378622319, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsl7CmW4lEDGIyfjnvZ96HFyDQ_7rks5tlNpYgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

TheAspens commented 6 years ago

The explanation I have come to understand and that I am operating under is that if the clear consent on the BOINC site states what information is public then as long as a mechanism exists to communicate the users intent to have their information removed which consumers of the public data can monitor, then the BOINC site will be in the clear. However, if the consumer of the public data does not follow the delete instructions, then the consumer of the public data could be at risk of violating GDPR.

I am also operating under the assumption that stats data that is tied to a user name, user id or cross project id is personal data and needs to be cleared as well.

As far as any blockchain tech goes - I have no idea how they will comply since the two are somewhat at odds with each other.

I want to be clear again that GDPR is not clear and that the interpretation I am operating under could be incorrect. We are trying to craft the technical changes that will be minimally impactful to BOINC provide the best understanding of what it takes to be compliant. This is why I really want the review of the people who are also trying to comply with the law to articulate their understanding as well since I am not an authoritiative in this matter.

TheAspens commented 6 years ago

(i.e. when the result record is deleted from the database and after the result has been assimilated there is no longer any connection between the result and the user)

I don't think that's necessarily true. Einstein (certainly) and I think SETI retain records of who processed which bit of the science - that's held in their master Science databases, long after the transactional processing records are purged from their BOINC databases.

WCG doesn't do this so I hadn't considered the impact of that.

TheAspens commented 6 years ago

I have updated my proposal in the following way based on some comments:

I need to re-read this thread in detail based on the above and add some of the additional details that @brevilo suggests above which I will do tomorrow. I also have no problems if someone wants to update my proposal page with improvements as well.

TheAspens commented 6 years ago

The changes I made can be seen wit this diff: https://boinc.berkeley.edu/trac/wiki/RightToErasure?action=diff&version=4&old_version=2

TheAspens commented 6 years ago

Also - this is somewhat relevant to the discussion of people who use the data produced by the BOINC sites: https://parissmith.co.uk/blog/beware-spiders-web-crawling-screen-scraping-legal-position/

I think that this (plus other information I've read and been told) means that BOINC sites need to specify the terms of use that people who view or use the data available on the site and that needs to include only using the information that users consent to be shared publically and to monitor the delete file and purge data that shows up in that file.

Again - @brevilo - plus comment heavily and @lfield - please share what CERN is telling you. Thanks!

SETIguy commented 6 years ago

Since it seems like you're going to go through with this overly expansive view of "personal data" regardless of my concerns, there need to be some changes. It needs two factor authentication at a minimum. Otherwise people will get their accounts deleted for them. Entering the password twice is insufficient, to say the least. It needs an emailed code that must be entered for the deletion to occur. The email needs a click link that will prevent the account from being deleted for 30 days, so if an account is being hacked it can be stopped. The deletion process needs a click through disclaimer on the penultimate page indicating that it is irrevocable and that there will be no record or acknowledgement of work done or contributions made, ever.

I can't wait to deal with the people whose accounts get hacked and deleted. I suppose it's better that then be able to know that an unidentifiable computer owned by someone unidentifiable did some SETI@home work in 2015.

I also think we should store a hash of the email address, so we can prevent that email address from ever being used to sign up again if we choose. We're all familiar with the people who leave a project in a huff, and then show up 2 weeks later to make trouble.

On Wed, Apr 4, 2018 at 3:47 PM, Kevin Reed notifications@github.com wrote:

The changes I made can be seen wit this diff: https://boinc.berkeley.edu/ trac/wiki/RightToErasure?action=diff&version=4&old_version=2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-378768836, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsnxdv9SH2B3WsXk1xA1OOHSe_cZWks5tlU1sgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

brevilo commented 6 years ago

@Ageless93

https://www.whatsapp.com/legal

Everything they say there is: you stay in control of your data (also because it's encrypted). That's exactly what the GDPR is all about. They thereby make it clear that data ownership is not transferred to any third party either (e.g. you as a conversation partner).

"Please remember that when you delete your account, it does not affect the information other users have relating to you, such as their copy of the messages you sent them."

Of course not, but WhatsApp wants to support offline copies as part of their service offering, so they chose to hand off control (not ownership!) of the subject's data to its conversation partners, but they do that transparently - so that the subject can decide and give consent. In BOINC's case the original content (PMs) doesn't get copied or stored offline so the controller (BOINC project) stays in control of the data. Therefore we are obliged to facilitate the data subject's rights, incl. deletion. That's what we'll do.

brevilo commented 6 years ago

@RichardHaselgrove

I don't think that's necessarily true. Einstein (certainly) and I think SETI retain records of who processed which bit of the science

Correct, we do, but that doesn't mean those records couldn't be disassociated on account deletion. Quoting Kevin in full length, he was talking about the future:

Since the science results can be seperated from any notion of the user

He said can and that's the point. We're going to make provisions that such that we can remove the task <-> user association, even for assimilated/deleted tasks.

public recognition of participation will, of course, have been subject to secondary and very specific consent

Yes, we're already asking for their individual consent to do this.

brevilo commented 6 years ago

@SETIguy

Given that the data has been available for download by the general public, upstream/downstream deletion can't be guaranteed for anyone except well behaved upstream/downstream partners with resources.

Yep, that part of the unsolved mysteries between law and reality. Time (courts) will tell.

Do we need to stop providing public stats dumps?

As I pointed out earlier: we most certainly will until these issues are settled somehow. For instance, this could mean that anonymous stats downloads need to be replaced by access control such that we know who has access. That's the only way we can notify every downstream consumer of the deletion request. If we can't, we're inviting sanctions. And yes, stats data are personal data since they're, by design, not anonymized (pseudonyms are personal data).

A cpid/gridcoin address link beacon can't be deleted from the blockchain.

Again, law vs. reality. The blockchain issue is a known one but it might be covered by the "legitimate interest" clause.

brevilo commented 6 years ago

@TheAspens

Introduce a delay between when the user confirms that they want the account to be deleted and when the account is actually deleted based on @drshawnkwang suggestion above.

I actually prefer your original implementation using a confirmation email over what we've got in Drupal so far which uses a delay instead of the confirmation mail. Both ways are meant to introduce some safety to prevent accidental deletion. That said, I don't really see what benefit an additional delay adds when a confirmation mail is already in place, but I don't have a strong opinion on that either, so feel free. Just wanted to share some more background/history context.

brevilo commented 6 years ago

@TheAspens

I think that this (plus other information I've read and been told) means that BOINC sites need to specify the terms of use that people who view or use the data available on the site

As far as I understand it might even go as far that we're (effectively) required to have a kind of (contractual?) agreement with downstream consumers. If you have no record of who you gave a copy of those personal data, you're going to be in trouble easily - whether that makes pragmatic sense or not. Thus my suggestion above to replace anonymous stats access with registered stats accounts, or we won't be able to comply with GDPR Art. 17.2.

brevilo commented 6 years ago

Disclaimer: like Kevin, I'm not a legal authority and everything I state here is my personal informed assessment. However, I've been working on this for weeks, went to a conference on the subject and had lengthy discussions with our data protection officer, so I think I have a reasonable understanding by now. Nevertheless, each project needs to make up their own mind and, ideally, consult their legal team about this. If in doubt, I recommend to err on the safe side since this tool is meant to hurt.

Ageless93 commented 6 years ago

In BOINC's case the original content (PMs) doesn't get copied or stored offline so the controller (BOINC project) stays in control of the data. Therefore we are obliged to facilitate the data subject's rights, incl. deletion. That's what we'll do.

@brevilo, nowhere on any of the BOINC pages do we now state that any information left on the site is thereby automatically copyright of the site or project. Neither the general BOINC Policies nor for instance the Einstein policies state anything about who is owner of messages on the forums or in private message and what will be done with these in other people's accounts upon deletion of one user's account.

So perhaps the one thing we should expeditiously introduce is the outbox, the code for which I've linked to in #1963. Then we can differentiate between the PMs sent (can be deleted) and received (hands off). Because the other thing I still see in forums is threads made by the deleted account, and what to think about quoted text, answers to now deleted messages, quoted text from deleted messages, etc.? Yes, that's nitpicking, but just be at the ready before someone complains about that. :)

brevilo commented 6 years ago

@Ageless93

nowhere on any of the BOINC pages do we now state that any information left on the site is thereby automatically copyright of the site or project.

Correct, yet neither did my statement you quoted imply that. Besides, I doubt that PMs and forum posts can by copyrighted in the first place (they don't cross the "threshold of originality"), but that's a different discussion. Let me repeat: we are the data controllers and we are obliged to make sure the data subject's rights are fully respected. Either way, you (as the PM receiver) are not entitled to claim ownership on the original author's data - and that has been your initial criticism/claim. If you still think otherwise, I'd be interested to learn about the detailed legal foundation on which you build your assumption.

Neither the general BOINC Policies nor for instance the Einstein policies state anything about who is owner of messages

The point is, I'm talking here about how we're going to interpret/fulfill the GDPR. Referencing our current policies is pointless as they've not been revised yet, but they will be in due course.

Again the right to erasure originated in the notion of the "right to be forgotten". That means if a data subject wants to see its data removed, we'll do so, incl. from your PM inbox which resides on our servers and we're thus still the controller of. Thereby control remains with the original data subject and that's what the GDPR is all about.

Ageless93 commented 6 years ago

Again the right to erasure originated in the notion of the "right to be forgotten". That means if a data subject wants to see its data removed, we'll do so, incl. from your PM inbox which resides on our servers and we're thus still the controller of. Thereby control remains with the original data subject and that's what the GDPR is all about.

@brevilo, So we agree that there is no copyright on the text. The text in a private message is not available in the public domain, it only shows in private. As soon as you send a private message to me, the text you sent is no longer in your account, no longer in your private mail box. (in normal BOINC forums, not counting Primegrid who have an outbox, of what changes Drupal brought) The text is now in my inbox. It's stored under my account. Under my accountID / userID.

How is it then possible that you as original author get to decide what's done with that upon deletion of your account? And here's a crucial one, what happens to all the text I sent you in private messages, that's still stored in your account (aka that you as a person didn't remove)? In your words, that text is mine and I get to decide what's done with it, you as a project do not get to remove any of it unless I give you my consent.

For a comparison, when I send you money from my bank account to your bank account, it's no longer on my account, it's on yours. When I now decide to close my account with the bank, and I want all mention of me deleted from the servers, the bank isn't going to delete that money from your account even though it will show in your receipts that I sent you that money - with name, bank account number etc. What they may adjust is that my name / account number no longer shows up in future searches, but the money stays with you until you decide you do something with it.

The moment I sent you the money it was no longer mine. The moment I send you a message the contents of the message are no longer mine. The words may be written by me, but I do not get to decide - via deletion of my account - that those words have to be removed from YOUR account. Under present policies, rules and circumstances, at least. (which happened on the BOINC forums when a user wanted his account deleted)

brevilo commented 6 years ago

The text is now in my inbox. It's stored under my account. Under my accountID / userID.

I disagree. A private message exists in a database table and references the sender (sent messages) as well as the recipient (inbox). The whole notion of "under my account" (or the bank analogy) doesn't apply. It never gets copied to another physical place (e.g. your premises/device, see WhatsApp) or leave our server in any way. There's no transfer of ownership. You get to see the messages the sender wanted you to see, that's all. We'll enable the sender to invoke his/her right to be forgotten.

How is it then possible that you as original author get to decide what's done with that upon deletion of your account?

As the controller we process data of a data subject. If that data subject enacts its right to erasure of its personal data, we're going to follow suit.

you as a project do not get to remove any of it unless I give you my consent

To my understanding that's wrong. If we as a project decide to shutdown our project, we can do so whenever we want. Whether that's a nice move or not is an irrelevant question here. Again, please let me know the legal basis that should prevent us from doing so (GPDR's data portability notwithstanding).

But... discussions like this are exactly one reason why the GDPR makes a lot of sense: sites are required to make all these terms of services transparent and require the user giving his/her consent before a service can be used! We'll describe all these things in detail so you can make an informed decision to sign-up or not. If so, you gave your consent already.

TheAspens commented 6 years ago

@SETIguy

Since it seems like you're going to go through with this overly expansive view of "personal data" regardless of my concerns

Eric - it is simple. If we ignore this or if we get it wrong and we get sued or even if we get audited and we are determiend to be non-compliant then we will be shut down in a heartbeat. As a result, yes - we will take the expansive interpretation - espicially since this is the direction that our internal advisors have directed us.

I am totally happy to provide a feature flag so that the "Right to be Erased" feature is optional for projects. Seti@Home and the University of California system may have no obligation to comply in which case you have no need for this feature.

TheAspens commented 6 years ago

@brevilo @SETIguy

Introduce a delay between when the user confirms that they want the account to be deleted and when the account is actually deleted based on @drshawnkwang suggestion above.

I actually prefer your original implementation using a confirmation email over what we've got in Drupal so far which uses a delay instead of the confirmation mail. Both ways are meant to introduce some safety to prevent accidental deletion. That said, I don't really see what benefit an additional delay adds when a confirmation mail is already in place, but I don't have a strong opinion on that either, so feel free. Just wanted to share some more background/history context.

The full path should be (which I tried to express in my updates to the design doc but apparently failed):

The assumption here is that the email address on record is that of the actual account owner. @Ageless93 brought up that if the account is hacked then the email address might not be the correct one any longer. As a result we will need another feature which does the following:

It might also be necessary that if we are within X hours of an email address change, then a notification of account deletion should also be sent to the previous email address.

This would prevent deletion of the account unless the users has lost control of their original email address. However, it is unclear to me that we can do anything about that.

@SETIguy, @Ageless93 and @brevilo - is this sufficient for securing the account?

drshawnkwang commented 6 years ago

@TheAspens - After discussing with @brevilo, we think the Drupal implementation will match the BOINC implementation - and remove the idea of a grace period between when the user confirms deletion and when the account is actually deleted. (If desired It could still be implemented - but maybe the project admin may choose between 0 hours, 24 hours, 1 week, etc. Maybe as a "ver. 2" extra feature).

Otherwise I like the outline that you have posted.

TheAspens commented 6 years ago

@drshawnkwang - can you clarify? My most recent suggestion is that there is an email validation step followed by a grace period (i.e. it includes both).

TheAspens commented 6 years ago

Does everyone think that #2451 should be implemented prior to or along with this change?

drshawnkwang commented 6 years ago

can you clarify? My most recent suggestion is that there is an email validation step followed by a grace period (i.e. it includes both).

We were thinking of only having the email validation; no grace period.

TheAspens commented 6 years ago

Also - should I set up a call to hash through this?

drshawnkwang commented 6 years ago

Also - should I set up a call to hash through this?

If there is a telecon, I'd like to listen in, even if its just as an 'observer'.

TheAspens commented 6 years ago

@SETIguy

I also think we should store a hash of the email address, so we can prevent that email address from ever being used to sign up again if we choose. We're all familiar with the people who leave a project in a huff, and then show up 2 weeks later to make trouble.

Would this help? Its easy to get an additional email address to bypass an email address block and if we hash email addresses, how will you know which to block and which not to block?

How do you block troublesome users now?

brevilo commented 6 years ago

@TheAspens

is this sufficient for securing the account?

Sounds adequate to me.

Does everyone think that #2451 should be implemented prior to or along with this change?

Jord has a point there, so ideally before this one. But GDPR is right around the corner so when #2451 can't be done in time, account erasure should get a higher overall prio in my opinion, due to its bigger impact.

SETIguy commented 6 years ago

Yes we should have a call.

On Thu, Apr 5, 2018 at 8:22 AM, Kevin Reed notifications@github.com wrote:

Also - should I set up a call to hash through this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-378974126, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsqCixdF2DDMOWnykNdkfGYv2wkS7ks5tljamgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

Ageless93 commented 6 years ago

A private message exists in a database table and references the sender (sent messages) as well as the recipient (inbox).

@brevilo, There's the thing. Under the BOINC forums we do not have sent messages. There is an Inbox and a Write option. No Sent box, no Sent Messages box, and unless I included myself as recipient for a PM I sent to you, I will have no copy of it. I will only get it back when you answer me and leave the original PM attached.

Which is different from how it works under Einstein, where it's set up like a thread, with a copy of the PM I sent to you, your answer to me, etc. I am not sure if it is set up different in the database, I'll leave that to someone with actual BOINC forum software to tell.

So under the BOINC forums, when the messages are deleted, they are deleted from the other users' inboxes only. They will notice that. (Aside, you get to wonder how private a private message is when all the other user has to do is to decide he doesn't want it staying there, deleting his account, thus deleting the message(s), then re-registering and continuing, but a la.)

Ageless93 commented 6 years ago

@TheAspens,

We will send a email with a link (includes special access token) to the prior address and let them know that their email address has changed. They will have X hours to use the link. The link will let them revert the email address change and then they will be required to provide a new password for the account.

The only omission I can think of here is where the email address is no longer active and the actual user never got to changing it, since the email address isn't really used (validated) under BOINC. So unless we also include validating the email addresses, there isn't much we can do in that scenario.

TheAspens commented 6 years ago

I've set up a doodle with times for Friday and Monday. Please respond with times that everyone who wants to participate can do so: https://doodle.com/poll/wq636h9a7w3zqnd5

Also - please try to check back here at 10:30 CDT/15:30 UTC to look for the time. I'll send an email to those that I have but I don't have everyone's so I will post the meeting info here as well.

TheAspens commented 6 years ago

There are three major parts to this:

  1. The process by which the user indicates that they want to be erased.
  2. The way that they are deleted
  3. The way that the fact of the erasing is communicated to consumers of the stats export

It is my understanding that 1 and 3 are not controversial but that 2 has discussion remaning. I am going to start development of the parts that are not controversial while waiting for resolution/chat about part 2.

TheAspens commented 6 years ago

Based on feedback above, the "process by which the user indicates that they want to be erased" will be dependent on #2451 (email change notification) which @Uplinger will implement. I've updated https://boinc.berkeley.edu/trac/wiki/RightToErasure to reflect the process which is:

I will not implement the "Account will be deleted" part until we have reached consensus. Other than that let me know if there are any objections to this process flow.