2i2c-org / features

Temporary location for feature requests sent to 2i2c
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Design offboarding process that hub admins can use to offboard their users #8

Open yuvipanda opened 2 years ago

yuvipanda commented 2 years ago

From @rabernat in https://github.com/2i2c-org/infrastructure/issues/1050#issuecomment-1064450254:

Over the 5-10 years of this project, people will exit the project. We need a sustainable approach to not only onboarding but offboarding.

Question for 2i2c: Beyond simply removing their access via the github group, what is the process for offboarding them and specifically purging their user data from storage so we don't continuously accumulate abandoned data?

When we offboard people, the following tasks need to be done:

We should design an interim offboarding process, as well as a self-service one.

rabernat commented 2 years ago

I am checking in on this issue. Has there been any progress on defining and implementing an offboarding process? The LEAP executive committee is reluctant to begin using our hub until an offboarding process is in place. So I would like to be able to provide an update on this.

[ ] Delete any data they might have on the scratch bucket.

I believe the scratch budgets are configured with a finite retention time for data. So that should not be necessary.

choldgraf commented 2 years ago

No there has not been progress on this (for future reference, this issue would reflect any discussion or plans we have around this topic).

Is there a specific set of concerns or questions that you need addressed? Looking at the list in the top comment, it seems like the only thing for which there is uncertainty is Delete their home directories, so it stops taking up space. There's not an easy way for hub admins to do this right now without involving 2i2c engineers.. Current practice is simply "the representative for the LEAP community asks us delete a person's home directory, and we can do it". That is not a scalable or long-term pattern, but is at least a workable solution.

Is there something else that needs to be addressed and is causing hesitancy?

rabernat commented 2 years ago

The concern is with home directories. Some of the LEAP EC members are concerned about data accumulating from temporary users (e.g. bootcamp participants) which will generate an ever greater costs over the 5+ years of this project. This could be mitigated by enforcing quotas on home directories, but my understanding is that such quotas are not supported. Me personally emailing support for every offboarded users is not workable because:

Over in https://github.com/leap-stc/leap-stc.github.io/pull/1, I am proposing a user policy for our hub. In it I say the following

Removing a user from the leap-pangeo-users group entirely will disable their access completely. An automated process will delete user data from the hub one month after a user is removed from the leap-pangeo-users group.

Is it feasible to implement a cron job of some sort that will perform this deletion.


Conversely, if you can provide some arguments and evidence that this issue (accumulating home-directory data) is not going to be a major cost or concern for our project, I can take that back to the EC. Without either such arguments or a technical solution to the problem, my colleagues will not feel comfortable moving forward with the hub.

choldgraf commented 2 years ago

Yeah - it is a balance between "removing data quickly will piss off users that want it retained", and "removing data slowly will incur extra storage costs". We don't have a strict policy for this because different communities have different preferences.

I think a reasonable approach is "if a user is explicitly deleted (by being removed from a GitHub team in this case) then it should be assumed their data will be deleted at the end of the month". @yuvipanda is that similar to how Berkeley does this?

yuvipanda commented 2 years ago

Berkeley's policy is that we archive user home directories to object storage if they haven't been used in 6 months, and users can manually request it back if they need to.

The question is how to figure out 'removed from the project for 1 month', as that can be a bit tricky. How about the following criteria:

  1. User home directory has not been modified in 1 month (or 2 months, my preference)
  2. User does not exist in the JupyterHub database.

So that means part of offboarding would require them to hit the 'delete' button in the hub control panel (https://leap.2i2c.cloud/hub/admin) and anyone designated hub admin can do so, along with removing them from the github team. It also means that user data could be gone sooner than 1 month after they are removed - as it is 1 month since last login. Hence my preference for that to be 2 months.

How does that sound, @rabernat?

yuvipanda commented 2 years ago

The other alternative is we just add a 'delete home directory' button that the admin can press as part of offboarding. So the offboarding process becomes:

  1. Delete from GitHub teams
  2. Delete from hub control panel
  3. Press button in (Tbd location) to delete their home directory
rabernat commented 2 years ago

I am totally fine with the 2 months! 👍

Can you clarify what the current "delete user" button actually does?

Can we find a way to avoid this manual step? My preference would be to have all membership managed via github, without having to have LEAP admins interact with the jupyterhub admin dashboard.

yuvipanda commented 2 years ago

@rabernat can we revisit in say 6 months or so wrt the manual step for decomissioning? Given the limited dev resources we currently have, and that I've to focus on https://github.com/2i2c-org/infrastructure/issues/1146 too, I'd prefer to do this iteratively than all in one go.

Does it remove the home directory?

It does not right now, but maybe I can just hook this up and that should solve our problems?!

What if I delete a user and they are still in the github team. Can they still log in?

Yes they can.

Conversely, what if I delete a user from the team but they are still in the hub database?

This is actually an important quesiton I don't know the answer to atm.