2i2c-org / team-compass

Organizational strategy, structure, policy, and practices across 2i2c.
https://compass.2i2c.org
4 stars 13 forks source link

Define a data privacy and usage policy #388

Open choldgraf opened 3 years ago

choldgraf commented 3 years ago

Context

Many communities want some a guarantee that we will not abuse our control over their data. In some cases, this may be a legal requirement (for example, working with communities that follow GDPR guidelines).

We should should define a policy that gives communities confidence that we will not use their data in any way that they do not wish.

Reference policies

Here are a few policies that we could use for inspiration:

Proposed language

2i2c Pilot Hubs user data policy

User data generated by using a 2i2c Hub is controlled by the users, not 2i2c. 2i2c does not retain any ownership or privileges for user data on the hubs that it deploys as a part of this pilot. The infrastructure that 2i2c deploys (e.g., JupyterHub and Kubernetes) does log some information about user behavior, such as sign-on timestamps and aggregated usage over time. This information may be used by 2i2c in diagnostics to improve hub deployments, or as aggregated statistics in order to demonstrate usage and interest for the purposes of grants etc. However, it will not share this data or any derivatives of this data (beyond aggregate statistics or visualizations) with any third parties.

Task and updates

colliand commented 3 years ago

This is an excellent issue and I am excited to develop this further. Here are a few quick reactions at a high level:

  1. There are standard processes used by universities to evaluate technology. One of these processes is called a privacy impact assessment (PIA). 2i2c should identify a best example of a PIA form, perhaps from Bill Allison, and fill it out. The PIA matrix provides a series of prompts that force 2i2c to consider. As a leading open science organization, 2i2c could perhaps disclose the PIA publicly.
  2. There is a tension between privacy, data ownership and transparency. This tension is modulated differently in Canada (and within Canada) and the USA. For example, the "2i2c way" involves a publicly visible hubs.yaml file that transparently reveals details about some users and administrators of various hubs. I don't believe UBC would allow this type of personally identifiable information (PII) to be shared.

The PIA process will likely allow 2i2c to define an ontology for the various data in scope. That ontology will include things like intellectual property created by the user, raw data from public or private sensor sources, personally identifiable information, and riskier data sets like medical or financial records. My view is that 2i2c should take a leading and opinionated approach here aligned with "open science" best practices.

choldgraf commented 3 years ago

This is super helpful, thanks for this extra information (I know very little about organizational considerations for data privacy).

I think it will take a while to go through the full exercise that you describe, and in the meantime there are organizations asking us what our policy is right now. Should we just say "we have no policy"? Or perhaps we can agree upon an informal language that at least conveys our values and approach even if it is not a rigorous policy?

colliand commented 3 years ago

We should ask those organizations for a PIA and to collaborate with us. We want to know from them what they want our data policy to be. For Syzygy, we mention non-profit, hosted on Compute Canada, and minimal PII retention and get approved right away. The transparency on some PII as part of the 2i2c plan will likely need to be addressed with "open science" values.

choldgraf commented 3 years ago

@colliand that's a good idea - @ericvd-ucb do you think one or more of the community colleges would be willing to brainstorm with us what their ideal user privacy agreement would be?

choldgraf commented 2 years ago

Update: one-off policy being used

@sgibson91 needs a data policy to cover the data collected for an SSI fellowship project she's working on, so I've gotten approval from CS&S to have a one-off use of the policy defined here (adapted from SSI). We should then define a more long-term policy for 2i2c that we can use with the hubs as well.

choldgraf commented 6 months ago

We now have a privacy policy defined here:

https://docs.2i2c.org/user/topics/policy/privacy/

Can this be closed? @jnywong would this work for your needs right now?

It also feels like this page is not discoverable if you weren't able to find it, so do you have thoughts on a better place to link it?

jnywong commented 6 months ago

Thanks, Chris! I did manage to find this page before but I don't think it quite works for my needs for now, since it refers mainly to data that is held on hub infrastructure rather than the type of data I will be collecting from the training feedback surveys.

I could expand https://docs.2i2c.org/user/topics/policy/privacy/ to incorporate what I need since I would prefer to link upstream to a SSOT in the Hub Service Guide.