OpenConext / OpenConext-engineblock

OpenConext SAML 2.0 IdP/SP Gateway
14 stars 22 forks source link

Non-ASCII chars in email addresses lead to "Error: an error occurred" #72

Closed baszoetekouw closed 10 years ago

baszoetekouw commented 10 years ago

It seems that when a user has non-ASCII chars in his email address, an exception occurs somewhere in EB.
This problem manifests itself when trying to log in with the "professor2" user of the DIY-IdP. The only obvious difference between professor1 (who can log in correctly) and professor2 is the use of UTF8-chars in the email address.

thijskh commented 10 years ago

These email addresses are not RFC compliant and will be completely useless to send email to as there's no mainstream email server that supports them.

In the ideal situation when the IdP sends garbage in an attribute EB would generate a really specific error message, but it is very hard to implement such a thing for every possible input. I'm inclined to close this ticket as this behaviour isn't seen in the real world, there's no valid use case for sending such attributes and stuff does not really break beyond repair if you do, you just get a bit of an abstract error message (caused by LDAP in this case).

relaxnow commented 10 years ago

Actually note that we already have some limited support for attribute validation and Email address validation.

Currently in the main flow we only check for the absolute minimum: https://github.com/OpenConext/OpenConext-engineblock/blob/master/library/EngineBlock/Corto/Filter/Command/ValidateRequiredAttributes.php#L21-21

Whereas on the idp debugging side we use the Attribute Validator: https://github.com/OpenConext/OpenConext-engineblock/blob/develop/application/modules/Authentication/View/Proxy/debugidpresponse.phtml#L21-21

Which uses our attribute definition file: https://github.com/OpenConext/OpenConext-engineblock/blob/develop/application/configs/attributes-SURFconext.json

Which warns against non-compliant email addresses: https://github.com/OpenConext/OpenConext-engineblock/blob/develop/application/configs/attributes-SURFconext.json#L127

Using Zend_Validate_EmailAddress: https://github.com/OpenConext/OpenConext-engineblock/blob/develop/library/EngineBlock/Attributes/Validator/Type.php#L77-77

That validates according to RFC2822: http://framework.zend.com/apidoc/1.0/Zend_Validate/Zend_Validate_EmailAddress.html#methodisValid

relaxnow commented 10 years ago

What I don't get is why EB would break on this.

thijskh commented 10 years ago

The breakage is in LDAP: when it's about to be inserted into the directory.

Zend_Ldap_Exception Object
(
    [_previous:Zend_Exception:private] => 
    [message:protected] => 0x15 (Invalid syntax; mail: value #0 invalid per syntax): adding: uid=wynn,o=harvard-example.edu,dc=surfconext,dc=nl
    [string:Exception:private] => 
    [code:protected] => 21
    [file:protected] => /opt/www/engineblock-4.1.6/vendor/zendframework/zendframework1/library/Zend/Ldap.php
    [line:protected] => 1243
    [trace:Exception:private] => Array
        (
            [0] => Array
                (
                    [file] => /opt/www/engineblock-4.1.6/library/EngineBlock/UserDirectory.php
                    [line] => 267
                    [function] => add
                    [class] => Zend_Ldap
                    [type] => ->
                    [args] => Array
                        (
[...]
                    [mail] => $t€ve.W¥nn@las.vegas
relaxnow commented 10 years ago

Ugh, LDAP InetOrgPerson schema expects no more than 265 IA5 (almost ASCII) characters.

relaxnow commented 10 years ago

Really nasty business IA5 by the way:

IA5 is almost the same as ASCII - differences are marked with * in the name column for our mutual convenience and because they may not show on your screen. There are only two printable differences - currency (hex 24 - decimal 36) and Equivalence (tilde) (hex 7E - decimal 126).

So if your email address is boy+~@ibuildings.nl (which is allowed by RFC2822) then it'll come out of LDAP as a over-line which is certainly not allowed. Or boy+$@ibuildings.nl would come out unprintable? Unless off course we just ignore the fact that it's IA5 and interpret what comes out as ASCII in which case it shouldn't matter what symbol LDAP prescribes to character code 126. Depends a lot on implementation details of the PHP LDAP client that I don't know. Maybe you could test this @thijskh ?

So courses of action we can take:

1. Ignore this edge case.

Cheap but nasty.

2. Switch to a different attribute for storing mails

Migration would be costly but little changes in code required. I would estimate a weeks worth of work due to having to implement LDAP migration, which would solve some technical debt. We would then also be passing non-RFC2822 email addresses to our SPs.

3. Add a quick check to ValidateRequiredAttributes.

RFC2822 minus tilde and currency. Cheap but diving a little deeper in technical debt.

4. Make validateRequiredAttributes use Attribute Validator and make it accurately reflect our actual requirements of no more than 265 IA5 characters.

Medium cost, I'd estimate a day or two, would solve some technical debt but leave the LDAP technical debt as it is. My personal recommendation.

5. Fix ALL the things

Combine 3. and 4. to fix both the LDAP and our verification.

thijskh commented 10 years ago

I'm strongly leaning towards 1. I don't see it as a priority to spend days of work on something no real world IdP has ever sent.

baszoetekouw commented 10 years ago

Hoi Thijs,

Ik wijs je graag op RFC6530, die UTF8 in email addressen (local parts en domeinen) toestaat (op SMTP-niveau via een speciale extensie). Ik ben het met je eens dat dit nog niet wijd ondersteund is, maar dat gaat wel komen. Ik zou er dus voor willen pleiten om hier wel degelijk een testcase in de DIY-IdP voor te hebben.

Gr, Bas.

thijskh commented 10 years ago

Our problem is not that EB doesn't like RFC6530 addresses, it's that LDAP will not accept them. Also they are not legal for an IdP (such as EB) to publish in the mail attribute.

I'm not sure why we're storing email addresses in LDAP anyway or why we're using LDAP at all. If either one of those is not or no longer a requirement we could maybe get away with just stopping to do that, and than the issue is solved. We would then be publishing illegal values for mail if the IdP sends them, but that's then only because the IdP did so in the first place.

Currently I'm not yet convinced that this is where we need to spend our time.

relaxnow commented 10 years ago

Hi @baszoetekouw ,

I would like to point you to the fact that we are using the "urn:mace:dir:attribute-def:mail" attribute, as registered by the Internet2 MACE working group here: https://www.internet2.edu/products-services/trust-identity-middleware/mace-registries/urnmace-namespace/urn-mace-dir-registry/urn-mace-dir-attribute-def/

This defines the syntax as conforming to RFC1274 and RFC2821. Both referring to RFC 822 for the syntax, which defines it as an atom, an atom being a collection of ASCII characters.

Non-ASCII characters exceed the specification of the chosen attribute for e-mail addresses.

Cheers, Boy

relaxnow commented 10 years ago

Interestingly this seems to be another case of conflicting interests, I can see a use-case for OpenConext (where the range of IdPs can be much larger) but not much for SURFconext, which is currently funding the development work.

thijskh commented 10 years ago

I'm fairly sure that SURFnet is not opposed to funding changes that are desired in the wider OpenConext community but do not directly beneft SURFconext. However, I'm not quite convinced that this issue is currently a priority in that wider comunity.

relaxnow commented 10 years ago

So can we close this issue @thijskh ?

thijskh commented 10 years ago

I'm indeed closing it. It will be archived here for reference and when there's a concrete use case we can always reconsider.