Closed mishaschwartz closed 1 year ago
Dupe of https://github.com/Ouranosinc/Magpie/issues/497. I have had this problem before so the current work-around is to manually remember to only use alphanumeric char in a username.
Being able to supply a username validation regex to Magpie would enforce this.
Dupe of https://github.com/Ouranosinc/Magpie/issues/497. I have had this problem before so the current work-around is to manually remember to only use alphanumeric char in a username.
Cool! Thanks I must have missed that one. I don't think that manually remembering to only use alphanum characters is a sustainable solution, especially once other organizations start to deploy the stack.
Being able to supply a username validation regex to Magpie would enforce this.
This is a good idea. I can work on this unless someone at CRIM has the capacity right now?
I'm going to suggest we don't close this (even though its a duplicate of the Magpie issue) since the solution will end up being a change to Magpie as well as a configuration change to the ini files in this repository as well.
I think the safest approach would be to use the user_id
rather than user_name
. This way, we never have to worry about either of Jupyter's or Magpie's allowed characters.
@fmigneault
Can you please clarify what you mean.
Where are we using user_id instead of user_name (which component)? What configuration files/variables do we need to change to do that?
Where are we using user_id instead of user_name (which component)? What configuration files/variables do we need to change to do that? @mishaschwartz
I think @fmigneault means using the user_id in the container name (jupyter-123
instead of jupyter-NAMEHERE
) and the folder on disk (/data/jupyterhub_user_data/123
instead of /data/jupyter_user_data/NAMEHERE
).
For me this is very user unfriendly and error prone. Often we have to chase down which user is using too much resources and currently it's very easy, the user name is directly in the container name.
Also we have to keep an eye on the disk space usage for each user so again having the user name directly in the folder path make this much easier, since we have a lot of users.
When deleting the user, it's also easy to find and wipe the corresponding persisted data on disk since the user name is in the path name to wipe. With user id, there is a risk we could wipe the wrong data on disk.
Thanks @tlvu that makes sense.
I've added a proposed change to Magpie here (https://github.com/Ouranosinc/Magpie/pull/592) that should allow us to optionally restrict the user names.
If that looks good and gets pulled in, I can make a PR here to update the magpie.ini
file.
For me this is very user unfriendly and error prone. Often we have to chase down which user is using too much resources and currently it's very easy, the user name is directly in the container name.
Also we have to keep an eye on the disk space usage for each user so again having the user name directly in the folder path make this much easier, since we have a lot of users.
I got an idea regarding that. Would it be possible to have system link in the form
/data/jupyterhub_user_data/{user_id} -> /data/jupyterhub_user_data/{user_name}
?
This way, a custom renaming/regex logic could be applied to obtain redirects for convenience, but real references would use the generic user IDs? You would have a extra step to check those links to figure out the mapping, but it would still be available somewhere.
The issue I forsee with custom regexes such as in https://github.com/Ouranosinc/Magpie/pull/592 is that the similar logic will have to be replicated everywhere that needs to resolve the directories. This means, all Weaver WPS output hooks, all Cowbird handlers, and so on. I think it would be wiser to abstract away that logic using integers.
I think https://github.com/Ouranosinc/Magpie/pull/592 can still be valid on its own. As an admin, you could ask to restrict certain user names, or even enforce some name conventions. However, I am not in favor of using this as a side-effect that assumes it would resolve all issues elsewhere. It feels more like a hack than a solution.
The issue I forsee with custom regexes such as in https://github.com/Ouranosinc/Magpie/pull/592 is that the similar logic will have to be replicated everywhere that needs to resolve the directories. This means, all Weaver WPS output hooks, all Cowbird handlers, and so on. I think it would be wiser to abstract away that logic using integers.
I'm not sure I understand this.
Right now, weaver WPS output hooks, cowbird handlers, etc. all use the username that is defined in Magpie to create directories. All we're doing is restricting the naming convention for those usernames. I don't understand how this affects weaver or cowbird.
I guess you're right. I was overthinking it. Usernames would be correct from upstream validation in Magpie. Just need to add a note in the changes that manual retrofit of all existing user names that do not conform to new naming convention will have to be done.
I got an idea regarding that. Would it be possible to have system link in the form
/data/jupyterhub_user_data/{user_id} -> /data/jupyterhub_user_data/{user_name}
?
This means at deletion time, we will still need to match the user_id
to user_name
to delete both of them, still error-prone to delete the wrong not-matching pair.
Also the user_name
is created in Magpie. So if the restriction is done at Magpie level, everything else that use the user_name
will be automatically in sync. There are no logic about user_name
validation to replicate.
add a note in the changes that manual retrofit of all existing user names that do not conform to new naming convention
I understand that the new optional validation regex only take effect for new user creation. I won't affect any existing users. So what "manual retrofit" are you referring to? Do you mean a DB migration of existing users? That should not happen because we will be changing their login-id and they won't be able to login anymore to Jupyter.
In fact, if they are already able to login to Jupyter, that means their naming convention is already matching the good one. Going forward we are simply able to properly enforce this naming convention.
Do you mean a DB migration of existing users?
Exactly. If the new regex is applied, and the function used to validate user properties is called, previously valid usernames would now cause API/UI errors. If their usernames were already patched to make them work Jupyter, then there shouldn't be any issue.
If the new regex is applied, and the function used to validate user properties is called, previously valid usernames would now cause API/UI errors.
The change was made to the check_user_info
function which is only called in Magpie when a user is being created or updated. Magpie will still work as before for old users with invalid usernames, they just won't be able to log in to jupyterhub (but they couldn't do that before).
I don't think there will be any old users with invalid names to migrate. If there were old users with usernames that broke jupyterhub you would know already because they would be complaining that they can't use jupyterhub.
For legacy reasons, we have Magpie users before the time of Jupyter so these users do not use Jupyter so no warranty that their usernames conform to the naming convention.
Is there a way to make this additional regex check apply only for new user creation but not when updating existing user (ex: reset their password because too long ago they forgot it or updating their email). Like the admin password 12 char length enforcement, it only trigger if we actually set a new password. If we set again but to the same existing password, the enforcement is bypassed.
Else can you document the DB migration steps we can run manually only for a very small list of problematic users?
Is there a way to make this additional regex check apply only for new user creation but not when updating existing user
It is the same function that performs the check in both cases. https://github.com/Ouranosinc/Magpie/blob/92ff2d2fa1e62b69a3a5edf22ef65d61238eaba9/magpie/api/management/user/user_utils.py#L124 https://github.com/Ouranosinc/Magpie/blob/92ff2d2fa1e62b69a3a5edf22ef65d61238eaba9/magpie/api/management/user/user_utils.py#L241
However, unless the user name is updated, the email/password should be checked on their own without revalidation of the user name.
However, unless the user name is updated, the email/password should be checked on their own without revalidation of the user name.
Super, so no need to DB migrate the existing users with "bad" username format.
Users sometime forget their password so we simply set a new one. We never had to change a username before.
@fmigneault Thanks for merging in https://github.com/Ouranosinc/Magpie/pull/592
Unfortunately, while testing the magpie/jupyterhub interface further it looks like the fact that jupyterhub also converts all uppercase characters to lowercase is a problem.
Magpie currently only does case insensitive regex comparisons so there's no way to restrict usernames to lowercase only. In order to allow us to enforce this, I've added another PR to Magpie here (https://github.com/Ouranosinc/Magpie/pull/594). Please have a look when you have a minute
Summary
Jupyterhub creates jupyterlab docker containers named
jupyter-{username}
where{username}
is the user name of the logged in user.Magpie permits usernames with
.
in them but jupyterhub will substitute the.
with-2e
when creating the jupyterlab container. This means that a user with the nameexample.user
will have a container namedjupyter-example-2euser
and the username according to jupyterlab will beexample-2euser
.This causes problems since we assume that these names will align.
Details
^[A-Za-z0-9]+(?:[\s_\-\.][A-Za-z0-9]+)*$
.
in container names)When spawning a new jupyterlab container, it fails with the following traceback:
I'm not entirely sure why the issue is a permission denied error but the cause is obviously due to an unexpected mismatch between the expected username and the username according to the container.
One possible solution would be to further restrict the valid usernames in magpie, either permanently or only when jupyterhub is enabled.
To Reproduce
.
in their name in magpie (eg:user.name
)docker logs -f jupyter-user.name
Environment
Concerned Organizations