Force xpath_username_location to return a string

duongnt commented 4 years ago

Right now the default option for xpath_username_location returns a list of string. https://github.com/bluedatainc/jupyterhub-samlauthenticator/blob/master/samlauthenticator/samlauthenticator.py#L543 will then return the first element of the list.

This creates a problem with xpath queries such as substring-before(//saml:Attribute[@Name="User.email"]/saml:AttributeValue/text(), "@"). When I tried setting xpath_username_location to that value to get the email id (e.g. duongnt from duongnt@gmail.com), all I got was the first letter d.

I already have a diff to change the default xpath_username_location from string(//saml:NameID/text()) to string(//saml:NameID/text()), and remove the [0] from L543 above, just wanted to make sure that this is a reasonable approach.

I tried using xpath queries such as //saml:Attribute[@Name="User.email"]/saml:AttributeValue/text()/substring-before(., "@"), but got "Invalid expression" error. I'm a beginner with xpath and I don't know how to apply functions such as substring-before to every items of a node-sets. Actually I tried the same query on the same xml on https://www.freeformatter.com/xpath-tester.html, and it works perfectly fine, so maybe this is a limitation of the lxml/etree python package we use (ref: https://stackoverflow.com/questions/8692/how-to-use-xpath-in-python)

distortedsignal commented 4 years ago

So I think that what you're trying to do could also be accomplished by overriding the normalize_username method that is part of the Authenticator class provided by Jupyterhub. So instead of yanking the username out of the email with XPath (which, ya, that ain't easy), it might be simpler to use the Python APIs to manipulate the string inside of the normalization function.

You would do this by writing something like the following in your Jupyterhub_config.py file:

def new_normailze_username(self, name):
    # The name is now only the bit before the "@"
    name = name.split("@")[0]
    return super().normalize_username(name)

c.SAMLAuthenticator.normalize_username = new_normalize_username

I think we should take a hard look at how names are passed around in the code.

Thanks for the great issues and PRs.

duongnt commented 4 years ago

Thanks for the prompt response, Tom! I guess I can do what you suggested, but I think in principle if we expect a string for username, then xpath should return exactly that? Converting from nodeset to string in xpath is trivial, so there's not many reasons to do that in python in my opinion. It's also kinda confusing for users like me - when I got the only the first letter I thought there's something wrong with my xpath syntax.

What do you think?

distortedsignal commented 4 years ago

I think this is (probably) a breaking change, and we should fix it in the next Minor Point update when we break other things (misspellings, and such). Hopefully I'll have the types integrated into the code at that point as well, and we'll be able to use those to prove the code more correct (they shouldn't impact either functionality or runtime speed - they're just for my/our edification).

duongnt commented 4 years ago

You're right that this could be a breaking change. How about we check the type of xpath result - if it's a list we return the first element, and if it's a string we return the whole string?

distortedsignal commented 4 years ago

That seems like a good stopgap measure - then we break on Minor Version upgrade. Seems like a good plan.

HewlettPackard / jupyterhub-samlauthenticator

Force xpath_username_location to return a string #46