Open yuvipanda opened 2 years ago
I am checking in on this issue. Has there been any progress on defining and implementing an offboarding process? The LEAP executive committee is reluctant to begin using our hub until an offboarding process is in place. So I would like to be able to provide an update on this.
[ ] Delete any data they might have on the scratch bucket.
I believe the scratch budgets are configured with a finite retention time for data. So that should not be necessary.
No there has not been progress on this (for future reference, this issue would reflect any discussion or plans we have around this topic).
Is there a specific set of concerns or questions that you need addressed? Looking at the list in the top comment, it seems like the only thing for which there is uncertainty is Delete their home directories, so it stops taking up space. There's not an easy way for hub admins to do this right now without involving 2i2c engineers.
. Current practice is simply "the representative for the LEAP community asks us delete a person's home directory, and we can do it". That is not a scalable or long-term pattern, but is at least a workable solution.
Is there something else that needs to be addressed and is causing hesitancy?
The concern is with home directories. Some of the LEAP EC members are concerned about data accumulating from temporary users (e.g. bootcamp participants) which will generate an ever greater costs over the 5+ years of this project. This could be mitigated by enforcing quotas on home directories, but my understanding is that such quotas are not supported. Me personally emailing support for every offboarded users is not workable because:
Over in https://github.com/leap-stc/leap-stc.github.io/pull/1, I am proposing a user policy for our hub. In it I say the following
Removing a user from the
leap-pangeo-users
group entirely will disable their access completely. An automated process will delete user data from the hub one month after a user is removed from theleap-pangeo-users
group.
Is it feasible to implement a cron job of some sort that will perform this deletion.
Conversely, if you can provide some arguments and evidence that this issue (accumulating home-directory data) is not going to be a major cost or concern for our project, I can take that back to the EC. Without either such arguments or a technical solution to the problem, my colleagues will not feel comfortable moving forward with the hub.
Yeah - it is a balance between "removing data quickly will piss off users that want it retained", and "removing data slowly will incur extra storage costs". We don't have a strict policy for this because different communities have different preferences.
I think a reasonable approach is "if a user is explicitly deleted (by being removed from a GitHub team in this case) then it should be assumed their data will be deleted at the end of the month". @yuvipanda is that similar to how Berkeley does this?
Berkeley's policy is that we archive user home directories to object storage if they haven't been used in 6 months, and users can manually request it back if they need to.
The question is how to figure out 'removed from the project for 1 month', as that can be a bit tricky. How about the following criteria:
So that means part of offboarding would require them to hit the 'delete' button in the hub control panel (https://leap.2i2c.cloud/hub/admin) and anyone designated hub admin can do so, along with removing them from the github team. It also means that user data could be gone sooner than 1 month after they are removed - as it is 1 month since last login. Hence my preference for that to be 2 months.
How does that sound, @rabernat?
The other alternative is we just add a 'delete home directory' button that the admin can press as part of offboarding. So the offboarding process becomes:
I am totally fine with the 2 months! 👍
Can you clarify what the current "delete user" button actually does?
Can we find a way to avoid this manual step? My preference would be to have all membership managed via github, without having to have LEAP admins interact with the jupyterhub admin dashboard.
@rabernat can we revisit in say 6 months or so wrt the manual step for decomissioning? Given the limited dev resources we currently have, and that I've to focus on https://github.com/2i2c-org/infrastructure/issues/1146 too, I'd prefer to do this iteratively than all in one go.
Does it remove the home directory?
It does not right now, but maybe I can just hook this up and that should solve our problems?!
What if I delete a user and they are still in the github team. Can they still log in?
Yes they can.
Conversely, what if I delete a user from the team but they are still in the hub database?
This is actually an important quesiton I don't know the answer to atm.
From @rabernat in https://github.com/2i2c-org/infrastructure/issues/1050#issuecomment-1064450254:
When we offboard people, the following tasks need to be done:
We should design an interim offboarding process, as well as a self-service one.