BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.03k stars 449 forks source link

EU-GDPR - Right to Erasure #2447

Closed TheAspens closed 6 years ago

TheAspens commented 6 years ago

Another aspect of the GDPR law is the Right to Erasure. I've created a proposed implementation that might meet the requires of this provision of the law. This continues the work outlined in #2332 and #2413.

The proposal is documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure

I would appreciate a review of this and feedback on the implementation. In particular, I would like feeback from people who are doing their own compliance work to review if this is likely to be the minimum steps necessary to comply with this provision of the law or if some lessor action (like scrubbing user fields such as email address, name, ip address etc) would be permitted. In particular I would appreciate @lfield and @brevilo to take a look and provide feedback.

TheAspens commented 6 years ago

@SETIguy - can you let us know what times you are available at https://doodle.com/poll/wq636h9a7w3zqnd5

RichardHaselgrove commented 6 years ago

@TheAspens - is it appropriate for me to offer to listen in? I won't be able to help much (if at all) with implementation, but I might be able to keep an ear on the broader canvas and try to catch anything you've missed.

TheAspens commented 6 years ago

Anyone who wants can listen in and participate - so if you are interested, just put your preferences onto the doodle. I only called @SETIguy because he has voiced the strongest dissent and therefore is critical to reaching consensus

TheAspens commented 6 years ago

The meeting will be at 1PM Chicago time on Monday. See https://www.timeanddate.com/worldclock/converter.html?iso=20180409T180000&p1=64&p2=791&p3=309 for local times. It will use WebEx at https://ibm.webex.com/meet/knreed

I'd like to approach the meeting as follows:

Let me know if you have any thoughts about what we can do to make the meeting and discussion as productive as possible.

RichardHaselgrove commented 6 years ago

I see 19:00 local Monday. I may pop in - don't wait for me if I'm not there.

RichardHaselgrove commented 6 years ago

Going back to the debate about private mail inboxes, and whether I have the right to keep a PM which a user has sent to me, even if that user decides to withdraw from the project and have their account erased.

As moderators, we sometimes advise users who report harassment or other abuse of the PM system to preserve the evidence and report the event to a competent authority. Seeing the event in context on a project server (with sender ID and timestamp) is of greater evidential value than any local copy or screenshot, which could easily have been doctored.

Not for the first time, there are parallels with Facebook. Today's Guardian newspaper (UK print edition) has a headline:

Secret tool deletes executive mail from recipients' mailboxes

If you send Mark Zuckerberg a Facebook message, he has a copy for ever. But if he sends you one, he can reach into your inbox and pluck it out of existence.

(fuller online version of the article at https://www.theguardian.com/technology/2018/apr/06/facebook-using-secret-tool-to-delete-messages-from-executives)

If a BOINC/project user has been abusing the PM service, and gets wind that an investigation has been started or is impending, they could request erasure and make the project delete the evidence. According to the UK's ICO, Article 23 enables Member States to introduce derogations to the GDPR in certain situations....to safeguard:

Some jurisdictions might even regard the destruction of such evidence as an offense in itself (but IANAL, too)

Altogether, this is young legislation, not yet tested either by legal courts or the court of public opinion. We need to tread carefully and thoughtfully.

TheAspens commented 6 years ago

I've opened the meeting at https://ibm.webex.com/meet/knreed. Post here or email me if you have any troubles getting in. Also - you should not have to install anything (although if you want to install the desktop software it is more robust - but I only use the web interface)

TheAspens commented 6 years ago

You should be able to use it without any plugin's at all

TheAspens commented 6 years ago

Minutes from the Meeting:

Attendees: @SETIguy @brevilo @RichardHaselgrove @Uplinger @drshawnkwang and myself

We reviewed the overall idea of this issue and the motivation behind it. We also reviewed the different aspects of the change as documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure and we agreed that we had consensus for everything except for what an actual delete consists of.

In order to resolve that difference of opinion we decided on the following change to provide project level flexibility in terms of how this works.

  1. There will be a feature flag that can be used to enable a link to appear on the users profile that links to '/request_delete_account.php' which is the page that a user uses to start the delete process. The same feature flag will also be used on /request_delete_account.php to prevent the page from being used unless the feature is enabled.
  2. There will be two 'delete' functions developed. One will take a strict interpretation and will aggressively delete the data and the second will take a less strict interpretation and will anonymize the data (@SETIguy will be defining what this function does). Projects will be able to select which of these functions is used and they will be able to implement their own function as well to be used. This will provide projects with flexibility. By default, the less strict interpretation function will be used.

This will provide projects with the ability to customize the behavior of this function as they see fit but also provide two reasonable implementations at the start.

I would appreciate it if the participants in the call could confirm my summary above and correct any errors that I made. Thanks!

TheAspens commented 6 years ago

@SETIguy - when you define what the less strict implementation does - you should look at what David did here: #2445 and see if that implementation does what you want and let me know.

brevilo commented 6 years ago

would appreciate it if the participants in the call could confirm my summary above

👍

lfield commented 6 years ago

I have read the proposal and have some comments:

First of all delete from user where id = ? breaks the referential integrity of the database. We therefore have to ensure that everything referencing this id is deleted and the code is robust for when the id is missing.

If we have the user_deleted table with the id, then have we deleted all personality identifiable information? As far as I understand there is no need to record when the user was deleted and if so, goes against the policy as we have to store an identifier to prove this.

For both user_submit and user_submit_app, the row can be deleted. This is used to authorize remote submission.

brevilo commented 6 years ago

We therefore have to ensure that everything referencing this id is deleted and the code is robust for when the id is missing.

Maybe we should finally add actual database (foreign key) constraints... But that's a separate issue...

Other than that you're of course right. I recommend to use a proper DB transaction for the clean up. Either all statements are executed (in the correct order), or none.

TheAspens commented 6 years ago

If we have the user_deleted table with the id, then have we deleted all personality identifiable information? As far as I understand there is no need to record when the user was deleted and if so, goes against the policy as we have to store an identifier to prove this.

@lfield - checkout https://boinc.berkeley.edu/trac/wiki/RightToErasure#FinalRemoval

The user_deleted and host_deleted only hold records for 60 days and then dispose of them. They are used to populate the users_deleted.xml and hosts_deleted.xml generated by the db_dump (see https://boinc.berkeley.edu/trac/wiki/RightToErasure#DataExports) so that we are able to comply with the notification requirements to consumers of the stats imports.

I believe that this complies with the intent of the law and balances the aggressive rate of deletion with the fact that some sources may only pull the data on a slower cycle and thus need a span of time where they can easily see which users were deleted.

I would also be fine shortening that delete period if 60 days is deemed too long.

Willy0611 commented 6 years ago

Hi all,

We did never agree what to do with BAM!. I can't find anything about BAM! or AMS here https://boinc.berkeley.edu/trac/wiki/RightToErasure. What if a user who created an account at Einstein via BAM! uses his right to remove at Einstein? How do you propose to tell BAM! about that? And the other way around, user at BAM! uses his right to remove, then BAM! has to tell Einstein about that. Currently there is no am_delete_account.php.

Willy.

On 19 April 2018 at 16:14, Kevin Reed notifications@github.com wrote:

If we have the user_deleted table with the id, then have we deleted all personality identifiable information? As far as I understand there is no need to record when the user was deleted and if so, goes against the policy as we have to store an identifier to prove this.

@lfield https://github.com/lfield - checkout https://boinc.berkeley.edu/ trac/wiki/RightToErasure#FinalRemoval

The user_deleted and host_deleted only hold records for 60 days and then dispose of them. They are used to populate the users_deleted.xml and hosts_deleted.xml generated by the db_dump (see https://boinc.berkeley.edu/trac/wiki/RightToErasure#DataExports) so that we are able to comply with the notification requirements to consumers of the stats imports.

I believe that this complies with the intent of the law and balances the aggressive rate of deletion with the fact that some sources may only pull the data on a slower cycle and thus need a span of time where they can easily see which users were deleted.

I would also be fine shortening that delete period if 60 days is deemed too long.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-382752288, or mute the thread https://github.com/notifications/unsubscribe-auth/ARDUNeNNDezgKYqX_KEFDee2vApCQ_VEks5tqJuvgaJpZM4TF2yB .

davidpanderson commented 6 years ago

I think we'll need to add an am_delete_account RPC.

SETIguy commented 6 years ago

What is the appropriate action? I assume that when an account at a project is deleted, the account manager disconnects the user from that project. When the user deletes an account manager account, what is the appropriate action? Do nothing? Delete every account associated with that user at every project? Have a user selectable option?

On Mon, Apr 23, 2018 at 3:31 PM, David Anderson notifications@github.com wrote:

I think we'll need to add an am_delete_account RPC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-383743756, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsvQKyJCQ_3Rg37LcuHP5ztkyXaBNks5trlZPgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

Willy0611 commented 6 years ago

Hi,

My opinions:

  1. /get_project_config.php should be extended with an extra tag specifying if the project must meet EU-GDPR and a extra tag indicating that the projects supports deleting accounts via RPC. BAM! reads that file daily so it wil then know if extra actions are required.
  2. When a BAM! user signs up for a project it will show any extra options required to meet EU-GDPR for the project (for example, a checkbox to comply with the EU-GDPR). The values of the extra options will be added to the AMS RPC to the project.
  3. The legacy problem is knowing whether or not a user created a project account through BAM! (solution under 1.3.1)
  4. When a user deletes his account at BAM! an option will be shown to also delete any associated project account or a selection of projects, indicating the EU-GDPR status of the project.
  5. When a user deletes his account at a project an option should be shown to delete the associated account at the AMS. Since the project probably doesn't know which AMS (if any) created the account it should send the delete request to all know AMS.
    1. This will not delete the BAM! account itself since this was not created by the project, it will only delete the project account under the BAM! account.
    2. Problem: what's the identifier?
    3. API needed at the AMS.
    4. Projects should store which AMS created the account.
      1. Problem: User can switch AMS, so project should probably store the last used AMS.

There's probably more but nothing comes to mind at the moment.

Willy.

On 24 April 2018 at 05:34, SETIguy notifications@github.com wrote:

What is the appropriate action? I assume that when an account at a project is deleted, the account manager disconnects the user from that project. When the user deletes an account manager account, what is the appropriate action? Do nothing? Delete every account associated with that user at every project? Have a user selectable option?

On Mon, Apr 23, 2018 at 3:31 PM, David Anderson notifications@github.com wrote:

I think we'll need to add an am_delete_account RPC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-383743756, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsvQKyJCQ_ 3Rg37LcuHP5ztkyXaBNks5trlZPgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-383792778, or mute the thread https://github.com/notifications/unsubscribe-auth/ARDUNQ1Ks3piuIpYtJCIKFeAWDCLfT6mks5trp0ogaJpZM4TF2yB .

brevilo commented 6 years ago

When a BAM! user signs up for a project it will show any extra options required to meet EU-GDPR for the project (for example, a checkbox to comply with the EU-GDPR).

According to our DPO the upstream account manager (AM) is required to handle the opt-in/consent problem in that scenario. However, informed consent can only be given to an actual statement/policy so that the AM has to present the project-specific text for that purpose, presumably mimicking the client's terms of use feature.

Projects should store which AMS created the account.

I agree but this needs an augmented RPC.

Anyhow, these account-creation-related issue should be discussed separately.

Other than that I agree that account deletion needs to be taken into account by AMs as well. I recommend to focus on AM -> project account deletion first (e.g. via a new am_delete_account RPC) as that's the more common case I think.

TheAspens commented 6 years ago

Here is my 2 cents:

My biggest concern about the new RPC is that the only authentication used by these RPC's is the authenticator . This will allow anyone who can obtain someone's authenticator to be able to delete someones account. Any thoughts about how to secure this?

Barring the issues around security of the new RPC- will these two points resolve the most critical questions?

TheAspens commented 6 years ago

Ok thinking longer on it. I think that the following might become necessary:

Thoughts on this approach?

Note that I do not have the bandwidth before the May 25th date to implement either the new RPC or this extra security step so if someone else could take this on that would be good.

Willy0611 commented 6 years ago

Hi,

I agree with all the points.

Willy.

On 2 May 2018 at 16:15, Kevin Reed notifications@github.com wrote:

Ok thinking longer on it. I think that the following might become necessary:

https://boincstats.com/ 1234afd123asdf1234asdf134asdf.....
  • The Web RPC users will provide a public key at a standard location like /public.key (i.e https://boincstats.com/public.key
  • The Web RPC user will use their private key to sign the message and send the signature with the request
  • The RPC will verify that the signer is a trusted signer and will then obtain the public key (either from local cache or from the remote server - but if the signature fails, it needs to refetch the public key to allow the signer to update their key) and then verify that the signature matches the content.
  • Only after that processing is complete and successful will it perform the actions of the RPC.

Thoughts on this approach? Note that I do not have the bandwidth before the May 25th date to implement either the new RPC or this extra security step so if someone else could take this on that would be good.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-385992695, or mute the thread https://github.com/notifications/unsubscribe-auth/ARDUNexGGzJ05nL9si5UCPayJzeoEZSWks5tub-LgaJpZM4TF2yB .

SETIguy commented 6 years ago

I don't think that the account manager needs to be able to run the project delete without intervention. The account manager should redirect the user to the project delete function. Then the delete would propagate back to the project manager in the next stats export.

On Wed, May 2, 2018 at 6:58 AM, Kevin Reed notifications@github.com wrote:

Here is my 2 cents:

My biggest concern about the new RPC is that the only authentication used by these RPC's is the authenticator . This will allow anyone who can obtain someone's authenticator to be able to delete someones account. Any thoughts about how to secure this?

Barring the issues around security of the new RPC- will these two points resolve the most critical questions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BOINC/boinc/issues/2447#issuecomment-385986870, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXcsg3CUKhlntlqvoms4UCPitfbBbkfks5tubuFgaJpZM4TF2yB .

-- Eric Korpela korpela@ssl.berkeley.edu AST:7731^29u18e3

sirzooro commented 6 years ago

I have read this whole discussion. I wonder if (or how) do you handle following situation: hacker learned password(s) to user email and project X. Then he deleted user's account at project X, and removed all emails sent during this process. User was not crunching at this project at the time, so BOINC Client was not complaining, and finally (e.g. after a month) found that his account somehow disappeared. If hacker decided to use "right to erasure", project admin also may not have an idea what happened. Some systems use security event logs (e.g. syslog servers), which may store entries like "date/time: account 'foo' was deleted from IP 1.2.3.4". I wonder if BOINC does something like this, and how these security logs are treated by GDPR.

brevilo commented 6 years ago

@TheAspens complex, but sound.

@SETIguy while I appreciate the simplicity of your approach, it would certainly defeat the whole purpose of account managers, right? That is, manage multiple downstream project-accounts via a single interface. Your example sounds like there is only one project.

@sirzooro

TheAspens commented 6 years ago

The discussion about account manager integration should continue in issue #2507

TheAspens commented 6 years ago

I was testing the handling of results returned but not validated and ran into problems. The logic of the validator and credit is complex and trying to add proper handling for the case where we are trying to validate a result returned by a host and user that have been deleted adds a lot edge cases to this code. Since we do not expect this feature to be used often and therefore the amount of results that would be in the status will be low, I am moving forward with the following proposal for how to handle this. It is documented here: https://boinc.berkeley.edu/trac/wiki/RightToErasure#ResultTable but also included below:

The removal of userid and hostid from the result table is challenging as the host and user records are used in computing credit and other stats. In order to keep things as straight forward as possible, the following logic will be implemented at the time the user deletes there account:

  • Any results that are in server state RESULT_SERVER_STATE_IN_PROGRESS and assigned to the user will be set to server_state RESULT_SERVER_STATE_OVER, outcome RESULT_OUTCOME_CLIENT_DETACHED, validate_state = VALIDATE_STATE_INVALID and the transitioner will be triggered for the result
  • Any results that are in server state RESULT_SERVER_STATE_OVER and outcome RESULT_OUTCOME_SUCCESS and validate_state VALIDATE_STATE_INIT or VALIDATE_STATE_INCONCLUSIVE and assigned to the user will be set to server_state RESULT_SERVER_STATE_OVER, outcome RESULT_OUTCOME_CLIENT_DETACHED, validate_state = VALIDATE_STATE_INVALID and the transitioner will be triggered for the result
    • The validator, assimilator and transitioner will be examined to make sure that other status are handled properly

Please let me know if anyone has any thoughts on this. I think that the work discarded will be extremely small and it avoids adding some signficant complexity to the code.

JuhaSointusalo commented 6 years ago

I suppose there is still a race window when back-end daemon has loaded result and other records, updated them and when it goes to update the database the records are gone. With the deadline approaching fast I'm not sure if you need to handle this case perfectly for v1.

If you are not going to delete results immediately then scrub stderr. stderr may sometimes contain personal data and in worst case scenarios it may take several months before the result gets removed.

TheAspens commented 6 years ago

If you are not going to delete results immediately then scrub stderr. stderr may sometimes contain personal data and in worst case scenarios it may take several months before the result gets removed.

GDPR allows for the retention of data that has a legitimate purpose. stderr is often needed for various review by the project and is removed when that purpose is complete. As a result, I think that it needs to be left in place.

TheAspens commented 6 years ago

I suppose there is still a race window when back-end daemon has loaded result and other records, updated them and when it goes to update the database the records are gone. With the deadline approaching fast I'm not sure if you need to handle this case perfectly for v1.

The PHP code has been implemented without any concept of transactions (everything is done with autocommit for each statement). I would have to take a deep look to see if the C code handles this any differently. Without transactions and lock in place (pessimistic or optimistic) throughout the system, I don't know how I could address this. I'd be open to ideas.

Ageless93 commented 6 years ago

A bit of news from Reuters

The pan-EU law comes into effect this month and will cover companies that collect large amounts of customer data including Facebook (FB.O) and Google (GOOGL.O). It won’t be overseen by a single authority but instead by a patchwork of national and regional watchdogs across the 28-nation bloc.

Seventeen of 24 authorities who responded to a Reuters survey said they did not yet have the necessary funding, or would initially lack the powers, to fulfill their GDPR duties.

and

Many watchdogs lack powers because their governments have yet to update their laws to include the Europe-wide rules, a process that could take several months after GDPR takes effect on May 25.

JuhaSointusalo commented 6 years ago

Without transactions and lock in place (pessimistic or optimistic) throughout the system, I don't know how I could address this. I'd be open to ideas.

I don't have any better ideas either.

sirzooro commented 6 years ago

Maybe it would suffice to start transaction in backend daemon, and use SELECT ... FOR UPDATE there? I did something similar long time ago on Oracle. As I recall, such query blocked other SELECT until transaction end. https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html

Edit: see also https://stackoverflow.com/questions/6066205/when-using-mysqls-for-update-locking-what-is-exactly-locked

drshawnkwang commented 6 years ago

I added some additional text about the new project config enable_delete_account in the design document.

TheAspens commented 6 years ago

2472 has been merged to master which closes this issue.

brevilo commented 6 years ago

@TheAspens Since this now got merged I'm wondering about the periodic cleanup scripts that are mandatory for this to cover the process end to end. As far as I can tell these are html/ops/delete_expired_users_and_hosts.php and html/ops/delete_expired_tokens.php. What are your recommendations to integrate them in a given project. Am I missing any?

Thanks

brevilo commented 6 years ago

Never mind, just found your "recommendations" and I'm presumbly not missing any either 👍

TheAspens commented 6 years ago

I've been trying to keep https://boinc.berkeley.edu/trac/wiki/ServerUpdates updated as well

Ageless93 commented 6 years ago

I haven't followed all the code changes, but the delete account option went live on the BOINC forums sometime in the past days. Silently again, no notification. I wonder though, if this also works for a user whose account is (temporarily) banished. Can they also still use the delete account option, or is it locked on only active -usable- accounts?

TheAspens commented 6 years ago

@Ageless93 - I would open a new issue for that. I don't know what the behavior would be.