dlindahl / omniauth-cas

A CAS OmniAuth Strategy
MIT License
88 stars 79 forks source link

Faulty XML parsing #47

Closed davidmilo closed 5 years ago

davidmilo commented 7 years ago

Hi, first of all thanks for maintaining this gem.

I had some problem when info parsed from xml was not entirely correct. Here is the example:

<cas:serviceResponse xmlns:cas='http://www.yale.edu/tp/cas'>
  <cas:authenticationSuccess>
  <cas:user>A</cas:user>
  <cas:attributes>
    <cas:attraStyle>Jasig</cas:attraStyle>
    <cas:mail>B</cas:mail>
    <cas:sc>C</cas:sc>
    <cas:first>D</cas:first>
    <cas:last>F</cas:last>
    <cas:nationality>G</cas:nationality>
    <cas:roles>F</cas:roles>
    <cas:roles>D</cas:roles>
    <cas:roles>R</cas:roles>
    <cas:roles>K</cas:roles>
    <cas:roles>L</cas:roles>
    <cas:roles>P</cas:roles>
    <cas:roles>I</cas:roles>
    <cas:roles>O</cas:roles>
    <cas:picture>N</cas:picture>
    <cas:birthdate>M</cas:birthdate>
    <cas:gender>M</cas:gender>
    <cas:telephone>N</cas:telephone>
    <cas:address>D</cas:address>
    <cas:section>F</cas:section>
    <cas:country>R</cas:country>
  </cas:attributes>
  </cas:authenticationSuccess>
</cas:serviceResponse>

As you can see, nodes cas:roles nodes are repeated several times (not sure if this is because cas server is implemented wrong or it is common way to do it.) but anyways. Your current parsing algorithm would only return last of those roles nodes: { "roles" => "O"} which is wrong. It should return all as an array: { "roles" => ["F" , "D" , "R" , "K" , "L", "P", "I", "O" ]}.

I would suggest to use Hash.from_xml - i think it comes from rails tho, it is more bulletproof for this kind of cases you haven't though about in your parsing algorithm.

danschmidt5189 commented 6 years ago

We're dealing with this same bug at UC Berkeley libraries. The trouble is that this behavior is intended, tested (context, fixture, assertion), and hard-coded.

@davidmilo suggested a breaking change, as clients may depend on the faulty "last value-only" behavior. And since any attribute could be (or become) multivalued, it may make sense to always return them as a list. (I'm not sure what the CAS spec has to say about this.)

An easy, non-breaking solution would be to simply forward ServiceTicketValidator@success_body to the fetch_raw_info callback, giving developers full access to the response object. Developers would then have the option of walking the Nokogiri document as they saw fit. For example:

fetch_raw_info: lambda { |v, opts, ticket, info, raw|
  {
    :roles => raw.xpath('//cas:roles').map(&:text),
  }
}

Here's the CAS v3 spec: https://apereo.github.io/cas/5.0.x/protocol/CAS-Protocol-Specification.html#4251-saml-cas-response-attributes).

danschmidt5189 commented 5 years ago

@davidmilo I merged a workaround in #51. If you're using the latest branch, try something like:

provider :cas,
  name: :calnet,
  host: "auth#{'-test' unless Rails.env.production?}.berkeley.edu",
  login_url: '/cas/login',
  service_validate_url: '/cas/p3/serviceValidate',
  fetch_raw_info: Proc.new { |strategy, opts, ticket, user_info, doc|
    doc.empty? ? {} : { "roles" => doc.xpath('//cas:roles').map(&:text) }
  }

I'm not sure that we know in advance what fields may be multivalued in a given CAS setup, so it seems safest to me to let the user handle it (as in my example). I really don't like the idea of using multi-valued fields only when we detect multiple values, as that could lead to the role appearing as a string for users with one role and as an array for users with multiple roles.

Closing for now. Feel free to comment with feedback and we can consider better designs.

Expect the fix in the upcoming v2.0.0 release.