berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
65 stars 39 forks source link

Write documentation and policy for handling archived data retrieval requests #2685

Open felder opened 3 years ago

felder commented 3 years ago

Let's figure out a process for archived data retrieval requests as well as how to service them.

I'm thinking the easiest way is to generate signed URLs and send those to people making the requests so that they can retrieve their data. Signed URLs restrict access by requiring the person to have the full URL in question and they automatically expire (max time is 7 days).

However, we need to make sure that the person making the request actually owns the data in question. Additionally we need a policy/process for how the requests should be made and an SLA so that people know when they can expect for the request to be satisfied.

balajialg commented 3 years ago

Thanks, @felder for taking the first crack at it. SLA request for data archival linked to 2536.

balajialg commented 3 years ago

Few questions for policy and process documentation,

  1. What is the process for students and staff to raise this request? Who handles such requests from our team side?
  2. What are the timelines for which generally store data and can share as a result? What kind of data can we share back with students? How do we ensure student privacy for such requests?
  3. What information can we share with students/faculty at the end of the semester to ensure that they back up their data and reduce such requests for the future?
yuvipanda commented 3 years ago

The archiver left a file in users' home directories with this template content: https://github.com/yuvipanda/homedir-archiver/blob/dae7f5bc9e3527238c556717f70de5b997574568/archiver/scanner.py#L17. I picked the email because it preserves student privacy. Longer term, we should instead try to make this entirely self serve...

balajialg commented 3 years ago

@felders's points

  1. Are we only going to provide URLs or optionally restore the files directly to the hub?
  2. Also I think we may want to only consider each request resolved when the private key created for the task is removed from the service account (edited)
felder commented 3 years ago

Alternatively is there a better way to do this that doesn't involve generating and exporting private keys from the service account?

balajialg commented 3 years ago

@yuvipanda Suggestions on the way forward regarding this issue?