IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 490 forks source link

Shibboleth: Support "Federated Login Mode" (i.e. feed of Identity Providers from InCommon) #2937

Closed pdurbin closed 8 years ago

pdurbin commented 8 years ago

http://guides.dataverse.org/en/4.2.3/installation/shibboleth.html#dataverse-idp-metadata-xml explains how to configure /etc/shibboleth/dataverse-idp-metadata.xml to specify one or more Identity Providers (IdPs) that you would like to allow Shibboleth users to log into your installation of Dataverse from. Many Dataverse installations will chose this mode, only allowing users from their own institution to log in.

The Harvard Dataverse plans to run in a mode that I'll call "Federated Login Mode" for lack of a better term. This means that rather than configuring an XML file by hand to include a list of approve Identity Providers (IdPs), we will configure Shibboleth to periodically download a list of IdPs approved by InCommon. As of this writing there are 426 InCommon-approved IdPs: https://incommon.org/federation/info/all-entities.html#IdPs

Once the list of hundreds of InCommon Identity Providers (IdPs) is in place I imagine the login page will look something like the login page at https://www.hathitrust.org in the screenshot below:

screen shot 2016-02-29 at 2 27 54 pm

To do for this issue:

After release:

pdurbin commented 8 years ago

Here's how https://demo.dataverse.org looks now that I just did a quick one-off download of http://md.incommon.org/InCommon/InCommon-metadata.xml while I was working on documenting how to configure Dataverse for use with an identity federation:

screen shot 2016-03-07 at 3 54 03 pm

pdurbin commented 8 years ago

In 21bd0e8 I documented a bit about how to set this up but I'm not sure what the best practices are. See http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation for a preview of what I wrote.

pdurbin commented 8 years ago

I'm sending this to QA for feedback on what I wrote at http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation

Once some of our test servers have been registered with InCommon as part of #2104 we'll reconfigure shib to use a feed as documented in the link above. I'm still not sure what the best practices are but I hope what I've written provides enough guidance.

kcondon commented 8 years ago

We've discussed this since and discovered that shibd supports metadata refresh intervals and validation of metadata against a cert as recommended by InCommon. The sample config to achieve updates and validation that should be placed in shibboleth2.xml can be found here: https://spaces.internet2.edu/display/InCFederation/Shibboleth+Metadata+Config#ShibbolethMetadataConfig-ConfiguretheShibbolethSP

Also, the refresh interval in the sample shows 7200 milliseconds rather than the default. It might be good to restore the default value as an example.

pdurbin commented 8 years ago

@kcondon in 5edf6a3 I rewrote the section on identity federations. Thanks for all the feedback. I hope you like this better. As before you can preview the docs at http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation

Regarding reloadInterval it turns out that the value is expressed in seconds as explained at https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPReloadableXMLFile so I didn't change anything since 7200 (2 hours) is the default (as seen in /etc/shibboleth/shibboleth2.xml.dist) and seems reasonable.

Passing to QA.

kcondon commented 8 years ago

Doc is good. Waiting for InCommons to test federation and config.

pdurbin commented 8 years ago

InCommon registration is being tracked in INC01226245 and #2104 and is a blocker for testing "Federated Login Mode".

pdurbin commented 8 years ago

Even though we are still waiting for InCommon registration in INC01226245 one thing that has changed recently is that half a dozen of our Dataverse servers have been added to Harvard's Identity Provider service. This means people with a HarvardKey login can now log into https://beta.dataverse.org , for example. As of this writing, that beta server is configured in "Specific Identity Provider(s)" as defined by http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#specific-identity-provider-s which means that I have a hard coded list of a few Identity Providers (IdPs) in /etc/shibboleth/dataverse-idp-metadata.xml. Harvard's IdP is one of them and HarvardKey works fine.

In contrast, https://demo.dataverse.org is currently half-configured in "Identity Federation" mode as defined by http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation and I say half configured because weeks ago I ran curl http://md.incommon.org/InCommon/InCommon-metadata.xml > /etc/shibboleth/dataverse-idp-metadata.xml and didn't set up any process for updating the metadata. The file was 6 weeks old. I believe the age of the file is why when I just now tried to log into the demo site I got the following error...

opensaml::saml2md::MetadataException
The system encountered an error at Thu Apr 28 11:40:35 2016
To report this problem, please contact the site administrator at root@localhost.
Please include the following message in any email:
opensaml::saml2md::MetadataException at (https://demo.dataverse.org/Shibboleth.sso/Login)
Unable to locate metadata for identity provider (https://fed.huit.harvard.edu/idp/shibboleth)

... and I say the stale file may have caused the error because https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPTroubleshootingCommonErrors#NativeSPTroubleshootingCommonErrors-Unabletolocatemetadataforidentityprovider%28https://identities.supervillain.edu/idp/shibboleth%29. says "the issue may be that the metadata has expired". I just now re-downloaded the metadata from InCommon, restarted shibd and then was able to successfully log in.

@kcondon this is sort of "go live" territory. We should at least document for ourselves how to ensure that metadata from InCommon doesn't go stale. Rather than this curl approach I think we should try to get working the idea of subscribing to a URL under MetadataProvider such as url="http://md.incommon.org/InCommon/InCommon-metadata.xml" as described in the page we link to from the Installation Guide: https://spaces.internet2.edu/display/InCFederation/Shibboleth+Metadata+Config#ShibbolethMetadataConfig-ConfiguretheShibbolethSP

I'm passing this to you for consideration. Please not that in https://help.hmdc.harvard.edu/Ticket/Display.html?id=233688 @donsizemore is in touch with Identity people at UNC so it's plausible that even before the demo server is registered with InCommon we could get login working with a UNC account if UNC is willing to put our metadata at https://demo.dataverse.org/Shibboleth.sso/Metadata into their system as we have done with Harvard. The same is true of MIT, where many months ago we did an out-of-band metadata exchange in https://help.hmdc.harvard.edu/Ticket/Display.html?id=194659

In short, I'm saying that I don't think this issue is completely blocked. There are things we can test with the demo server in "Identity Federation" mode.

kcondon commented 8 years ago

I think this is part of specifying the production config and should be part of the feature design. My assumption was we were going to use the auto refresh and cert validation method recommended by InCommons that we had discussed previously. So, putting this back in your court until we have InCommons and we are ready to think about turning the service on.

kcondon commented 8 years ago

It looks like the validUntil field in the xml from the IdP (dataverse-idp-metadata.xml) is the primary staleness indicator. There is also the optional RequireValidUntil field in the shibboleth2.xml that overrides validUntil if it is set too long in the future so as to invalidate the trust model (recommended no more that 14 days).

The maxRefreshDelay in shibboleth2.xml should allow us to avoid the metadata expired error if we set it to be sooner than validUntil or RequireValidUntil, whichever is smaller.

https://spaces.internet2.edu/display/InCFederation/Shibboleth+Metadata+Config#ShibbolethMetadataConfig-ConfiguretheShibbolethSP

Last, it is possible, though not recommended, to use a validUntil per entity. It is better to have one per xml document. If there are shorter values per entity, then those individual resources will throw the expired error, not all entities in the file.

http://shibboleth.1660669.n2.nabble.com/Expired-Metadata-td5453516.html

...an entity validUntil should only invalidate the particular entity, not the metadata source as a whole. Yes, that's how the attribute works. -- Scott

I was able to change the validUntil field to be valid/invalid and confirmed I saw the metadata expired message.

pdurbin commented 8 years ago

I just wanted to point out that @donsizemore brought https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Attribute+Bundle to my attention. The nice things is that we aren't requiring any exotic attributes at http://guides.dataverse.org/en/4.3/installation/shibboleth.html#shibboleth-attributes . We should expect InCommon IdPs to send us what we're asking for:

pdurbin commented 8 years ago

We've been tracking InCommon registration for Harvard Dataverse servers in INC01226245 but now there is also IAMPROJECT-412 in Jira. I am blocked but on this but @donsizemore indicated at https://help.hmdc.harvard.edu/Ticket/Display.html?id=233688#txn-4712505 that Odum/UNC might be able to register a Dataverse server with InCommon so that some testing could potentially take place there.

donsizemore commented 8 years ago

Phil: Odum is currently working on registering dataverse-test.irss.unc.edu and dataverse.unc.edu with InCommon (so far, they're each registered with SSO.unc.edu and we're currently wrangling the Shibboleth configuration). Please pardon any break-age on our current Shibboleth test server.

pdurbin commented 8 years ago

@donsizemore thanks for adding https://dataverse-test.irss.unc.edu to https://incommon.org/federation/info/all-entities.html#SPs ! I'm absolutely thrilled to have a place to test what it's like for a Dataverse installation to be a Service Provider (SP) with InCommon, which is exactly what the "Federated Login Mode" feature is about.

Unfortunately, being listed at https://incommon.org/federation/info/all-entities.html#SPs does not seem to be enough for InCommon Identity Providers (IdPs) to release the attributes that Dataverse requires for login, which are eppn, givenName, sn, and email per http://guides.dataverse.org/en/4.3.1/installation/shibboleth.html#shibboleth-attributes . Users get an unfriendly message like "Problem with Identity Provider The SAML assertion for "givenName" was null. Please contact support." Here's a screenshot:

shib_-_2016-06-07_10 14 26

With the exception of UNC, institutional login isn't working at all. So far we've had attempts from three universities:

In https://help.hmdc.harvard.edu/Ticket/Display.html?id=235496 @donsizemore and I are attempting to get https://dataverse-test.irss.unc.edu/ added to the "Research & Scholarship" category at https://incommon.org/federation/info/all-entity-categories.html#SPs to see if that helps. I say it might help because https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Attribute+Bundle says the following...

Identity providers are encouraged to release the R&S attribute bundle to all R&S service providers:

  • Identifiers
    • eduPersonPrincipalName
    • eduPersonTargetedID
  • Mail attribute
    • mail
  • Person name attributes
    • displayName
    • givenName
    • sn (surname)
  • Authorization attribute
    • eduPersonScopedAffiliation

... which means that all the required attributes would be released to Dataverse.

If it turns out that joining the Research & Scholarship category is enough to get institutional login working for all or most of the InCommon Identity Providers (IdPs) we would probably due well to document this in the Installation Guide. InCommon is US-centric but it can serve as an example.

As I mentioned to @djbrooke, we could improve the error message end users see: http://irclog.iq.harvard.edu/dataverse/2016-06-07#i_36288 . Maybe it could say something like, The institution you are trying to log in with does not provide all the required "attributes" to log in to Dataverse. Please contact the team that manages your institution's Identity Provider (IdP) for assistance."

Longer term, @mcrosas and I have discussed changing the Dataverse code to allow for "initial population of certain fields (if provided), later editable" at https://docs.google.com/document/d/157sw9gaFGwrb0EtCGGlLMxSg0cooHfKEvIsjcebPcA8/edit?usp=sharing but I just realized that this concept isn't listed under potential "phase 2" activities in the Remote Authentication BRD at https://docs.google.com/document/d/1vcAmo2nkFYavAr7OwwXzxM0IFQbkRZYZrrX43q-wqGE/edit?usp=sharing so I just left a comment in the BRD about this idea, which is a non-trivial code change. It's what we want some day. Most users will probably expect that if they just logged in with HarvardKey or whatever that Dataverse will know their email address (this is how login via Google, Facebook, Twitter, GitHub, etc. usually works) and won't have to add this required field manually, but we should be prepared for when Identity Providers (IdPs) by policy don't provide Service Providers (SPs) such as Dataverse their users' name or email address.

pdurbin commented 8 years ago

Woo hoo! https://dataverse.harvard.edu is now listed as an InCommon Service Provider (SP)! The following screenshot is what you see if you click "Harvard Dataverse" at https://incommon.org/federation/info/all-entities.html#SPs

incommon

Here's what you see at https://incommon.org/federation/info/org.html?orgName=Harvard%20College

screen shot 2016-06-10 at 2 43 37 pm

The next step will be to get https://dataverse.harvard.edu listed under the "Research & Scholarship" category at https://incommon.org/federation/info/all-entity-categories.html because as I mentioned in my last comment, we recently learned that it is highly unlikely the the 400+ InCommon Identity Providers (IdPs) will release the attributes Dataverse requires (name and email, basically).

pdurbin commented 8 years ago

https://dataverse-test.irss.unc.edu is now listed under "research-and-scholarship" at https://incommon.org/federation/info/all-entity-categories.html#SPs ...

incommon_federation_info_entity_categories_-_2016-06-10_16 21 56

... which resulted in MIT and UIC being able to log in!

Unfortunately, Harvard and Emory users can't log in because neither institution is part of the Research & Scholarship category at https://www.incommon.org/federation/info/all-entity-categories.html#IdPs (only 57 of the 430 InCommon IdPs are part of that category).

I'm keeping track of who can log in at https://docs.google.com/spreadsheets/d/1fWHGamXetTQw3cpf6Tc4efmTvoC97lpEeTLvN7Mxz3g/edit?usp=sharing

pdurbin commented 8 years ago

Unfortunately, Harvard and Emory users can't log in because neither institution is part of the Research & Scholarship category at https://www.incommon.org/federation/info/all-entity-categories.html#IdPs (only 57 of the 430 InCommon IdPs are part of that category).

@donsizemore figured out how to limit the number of institutions that can log in to https://dataverse-test.irss.unc.edu to just the 57 that are part of the Research & Scholarship category by following https://spaces.internet2.edu/display/InCFederation/Migrating+an+SP+to+Global+Research+and+Scholarship . Of course, that leaves Harvard users and many others out in the cold but it makes no sense to offer a login that won't work due to required attributes not being released.

pdurbin commented 8 years ago

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPMetadataProvider#NativeSPMetadataProvider-ChainingMetadataProvider is interesting. The examples shows a URL from a federation and a local XML file:

<MetadataProvider type="Chaining">
    <MetadataProvider type="XML" path="partners.xml"/>
    <MetadataProvider type="XML" url="https://federation.org/metadata.xml" backingFilePath="fedmetadata.xml"/>
</MetadataProvider>

It also says "With V2.4 and above, this is implied by any configuration with multiple elements, so is no longer explicitly needed unless one of its optional settings is required." This means we might not even need to use the "Chaining" stuff. It might just work. Something to investigate since I think we'll need both a feed a local file in production.

Thank you @donsizemore for figuring out you can use multiple MetadataProvider elements.

pdurbin commented 8 years ago

@donsizemore have been talking a lot about how to handle Identity Providers (IdPs) who do not release attributes that Dataverse requires such as "eppn" (a unique identifier for a user).

He pointed out https://spaces.internet2.edu/display/InCFederation/Error+Handling+Service which is a service InCommon member can use to display a somewhat friendlier error message. For example, if a researcher from the University of Texas at Austin tried to log into https://dataverse.harvard.edu we could put the more descriptive error message at https://ds.incommon.org/FEH/sp-error.html?sp_entityID=https%3A%2F%2Fdataverse.harvard.edu%2Fsp&idp_entityID=https%3A%2F%2Fidp.its.utexas.edu%2Fidp%2Fshibboleth in an iframe or something. Here's how that page looks:

incommon_federated_error_handling_-_2016-06-22_08 58 07

In practice, for now anyway, we plan to prevent researches from the University of Texas at Austin and many other InCommon institutions that are not part of the Research & Scholarship category at https://incommon.org/federation/info/all-entity-categories.html#IdPs (only 60 of 433 IdPs are part as of this writing) from even attempting to log in to https://dataverse.harvard.edu with their institutional credentials by filtering out the 373 institutions that are not part of the Research & Scholarship category.

pdurbin commented 8 years ago

Woo-hoo! https://demo.dataverse.org and https://beta.dataverse.org were just added as InCommon Service Providers! That means we can work on switching from "Specific Identity Provider(s)" mode to "Identity Federation" mode as documented at http://guides.dataverse.org/en/4.4/installation/shibboleth.html#specific-identity-provider-s-vs-identity-federation

Part of this will be setting up (and probably documenting on the page above via a pull request) how to set up periodic metadata refresh which has been a blocker for @djbrooke getting the Harvard Dataverse servers (production, demo, and beta) added to the Research & Scholarship category at https://incommon.org/federation/info/all-entity-categories.html#SPs

murphy:tmp pdurbin$ curl -s http://md.incommon.org/InCommon/InCommon-metadata.xml | grep dataverse | grep entityID
<EntityDescriptor entityID="https://dataverse-test.irss.unc.edu/shibboleth">
<EntityDescriptor entityID="https://dataverse.unc.edu/shibboleth">
<EntityDescriptor entityID="https://beta.dataverse.org/sp">
<EntityDescriptor entityID="https://dataverse.harvard.edu/sp">
<EntityDescriptor entityID="https://demo.dataverse.org/sp">

I'm using the curl command above because it seems like https://incommon.org/federation/info/all-entities.html#SPs isn't working properly. None of those 5 Dataverse servers appear there, as @donsizemore and I have been discussing at http://irclog.iq.harvard.edu/dataverse/2016-07-26#i_38772

pdurbin commented 8 years ago

I just reconfigured https://demo.dataverse.org for "Federated Login Mode" and captured the necessary changes into f0f5ac9 (maxRefreshDelay="3600" which is in seconds, is the key).

Next @djbrooke will be picking up where we left on the R&S form: https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form . The demo site is ready to go. We can now check the box next to "My service refreshes and verifies metadata at least daily" for the demo site.

Eventually, the beta and production sites need to be reconfigured as well.

pdurbin commented 8 years ago

@djbrooke discussed this issue. I'll take a swing at filling out https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form myself.

pdurbin commented 8 years ago

For entityID "https://demo.dataverse.org/sp" I filled out https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form and we are tracking this at https://help.hmdc.harvard.edu/Ticket/Display.html?id=239200

For "How my service supports research and scholarship" I put "Dataverse is open source research data repository software. The Harvard Dataverse is open for all researchers worldwide from all disciplines to deposit data."

I checked the boxes agreeing to https://refeds.org/category/research-and-scholarship and http://www.incommon.org/docs/policies/participationagreement.pdf but someone with more authority such as @djbrooke or @mcrosas should review these documents as well, expecially when we fill out this form for our production service.

That's the demo site. The beta site is in use for usability testing today but @kcondon and I plan to reconfigure it soon so that in the R&S form above we can check the box that says "My service refreshes and verifies metadata at least daily." I'll co-assign this issue to him so some knowledge transfer can take place.

Some coordination will be necessary to enable this in production and meet the refresh requirement without confusing users. InCommon members won't be able to log in until production is part of the R&S category.

kcondon commented 8 years ago

Works, closing.