Closed pdurbin closed 8 years ago
Here's how https://demo.dataverse.org looks now that I just did a quick one-off download of http://md.incommon.org/InCommon/InCommon-metadata.xml while I was working on documenting how to configure Dataverse for use with an identity federation:
In 21bd0e8 I documented a bit about how to set this up but I'm not sure what the best practices are. See http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation for a preview of what I wrote.
I'm sending this to QA for feedback on what I wrote at http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation
Once some of our test servers have been registered with InCommon as part of #2104 we'll reconfigure shib
to use a feed as documented in the link above. I'm still not sure what the best practices are but I hope what I've written provides enough guidance.
We've discussed this since and discovered that shibd supports metadata refresh intervals and validation of metadata against a cert as recommended by InCommon. The sample config to achieve updates and validation that should be placed in shibboleth2.xml can be found here: https://spaces.internet2.edu/display/InCFederation/Shibboleth+Metadata+Config#ShibbolethMetadataConfig-ConfiguretheShibbolethSP
Also, the refresh interval in the sample shows 7200 milliseconds rather than the default. It might be good to restore the default value as an example.
@kcondon in 5edf6a3 I rewrote the section on identity federations. Thanks for all the feedback. I hope you like this better. As before you can preview the docs at http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation
Regarding reloadInterval
it turns out that the value is expressed in seconds as explained at https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPReloadableXMLFile so I didn't change anything since 7200 (2 hours) is the default (as seen in /etc/shibboleth/shibboleth2.xml.dist) and seems reasonable.
Passing to QA.
Doc is good. Waiting for InCommons to test federation and config.
InCommon registration is being tracked in INC01226245 and #2104 and is a blocker for testing "Federated Login Mode".
Even though we are still waiting for InCommon registration in INC01226245 one thing that has changed recently is that half a dozen of our Dataverse servers have been added to Harvard's Identity Provider service. This means people with a HarvardKey login can now log into https://beta.dataverse.org , for example. As of this writing, that beta server is configured in "Specific Identity Provider(s)" as defined by http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#specific-identity-provider-s which means that I have a hard coded list of a few Identity Providers (IdPs) in /etc/shibboleth/dataverse-idp-metadata.xml
. Harvard's IdP is one of them and HarvardKey works fine.
In contrast, https://demo.dataverse.org is currently half-configured in "Identity Federation" mode as defined by http://guides.dataverse.org/en/2939-shib/installation/shibboleth.html#identity-federation and I say half configured because weeks ago I ran curl http://md.incommon.org/InCommon/InCommon-metadata.xml > /etc/shibboleth/dataverse-idp-metadata.xml
and didn't set up any process for updating the metadata. The file was 6 weeks old. I believe the age of the file is why when I just now tried to log into the demo site I got the following error...
opensaml::saml2md::MetadataException
The system encountered an error at Thu Apr 28 11:40:35 2016
To report this problem, please contact the site administrator at root@localhost.
Please include the following message in any email:
opensaml::saml2md::MetadataException at (https://demo.dataverse.org/Shibboleth.sso/Login)
Unable to locate metadata for identity provider (https://fed.huit.harvard.edu/idp/shibboleth)
... and I say the stale file may have caused the error because https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPTroubleshootingCommonErrors#NativeSPTroubleshootingCommonErrors-Unabletolocatemetadataforidentityprovider%28https://identities.supervillain.edu/idp/shibboleth%29. says "the issue may be that the metadata has expired". I just now re-downloaded the metadata from InCommon, restarted shibd
and then was able to successfully log in.
@kcondon this is sort of "go live" territory. We should at least document for ourselves how to ensure that metadata from InCommon doesn't go stale. Rather than this curl
approach I think we should try to get working the idea of subscribing to a URL under MetadataProvider
such as url="http://md.incommon.org/InCommon/InCommon-metadata.xml"
as described in the page we link to from the Installation Guide: https://spaces.internet2.edu/display/InCFederation/Shibboleth+Metadata+Config#ShibbolethMetadataConfig-ConfiguretheShibbolethSP
I'm passing this to you for consideration. Please not that in https://help.hmdc.harvard.edu/Ticket/Display.html?id=233688 @donsizemore is in touch with Identity people at UNC so it's plausible that even before the demo server is registered with InCommon we could get login working with a UNC account if UNC is willing to put our metadata at https://demo.dataverse.org/Shibboleth.sso/Metadata into their system as we have done with Harvard. The same is true of MIT, where many months ago we did an out-of-band metadata exchange in https://help.hmdc.harvard.edu/Ticket/Display.html?id=194659
In short, I'm saying that I don't think this issue is completely blocked. There are things we can test with the demo server in "Identity Federation" mode.
I think this is part of specifying the production config and should be part of the feature design. My assumption was we were going to use the auto refresh and cert validation method recommended by InCommons that we had discussed previously. So, putting this back in your court until we have InCommons and we are ready to think about turning the service on.
It looks like the validUntil field in the xml from the IdP (dataverse-idp-metadata.xml) is the primary staleness indicator. There is also the optional RequireValidUntil field in the shibboleth2.xml that overrides validUntil if it is set too long in the future so as to invalidate the trust model (recommended no more that 14 days).
The maxRefreshDelay in shibboleth2.xml should allow us to avoid the metadata expired error if we set it to be sooner than validUntil or RequireValidUntil, whichever is smaller.
Last, it is possible, though not recommended, to use a validUntil per entity. It is better to have one per xml document. If there are shorter values per entity, then those individual resources will throw the expired error, not all entities in the file.
http://shibboleth.1660669.n2.nabble.com/Expired-Metadata-td5453516.html
...an entity validUntil should only invalidate the particular entity, not the metadata source as a whole. Yes, that's how the attribute works. -- Scott
I was able to change the validUntil field to be valid/invalid and confirmed I saw the metadata expired message.
I just wanted to point out that @donsizemore brought https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Attribute+Bundle to my attention. The nice things is that we aren't requiring any exotic attributes at http://guides.dataverse.org/en/4.3/installation/shibboleth.html#shibboleth-attributes . We should expect InCommon IdPs to send us what we're asking for:
We've been tracking InCommon registration for Harvard Dataverse servers in INC01226245 but now there is also IAMPROJECT-412 in Jira. I am blocked but on this but @donsizemore indicated at https://help.hmdc.harvard.edu/Ticket/Display.html?id=233688#txn-4712505 that Odum/UNC might be able to register a Dataverse server with InCommon so that some testing could potentially take place there.
Phil: Odum is currently working on registering dataverse-test.irss.unc.edu and dataverse.unc.edu with InCommon (so far, they're each registered with SSO.unc.edu and we're currently wrangling the Shibboleth configuration). Please pardon any break-age on our current Shibboleth test server.
@donsizemore thanks for adding https://dataverse-test.irss.unc.edu to https://incommon.org/federation/info/all-entities.html#SPs ! I'm absolutely thrilled to have a place to test what it's like for a Dataverse installation to be a Service Provider (SP) with InCommon, which is exactly what the "Federated Login Mode" feature is about.
Unfortunately, being listed at https://incommon.org/federation/info/all-entities.html#SPs does not seem to be enough for InCommon Identity Providers (IdPs) to release the attributes that Dataverse requires for login, which are eppn, givenName, sn, and email per http://guides.dataverse.org/en/4.3.1/installation/shibboleth.html#shibboleth-attributes . Users get an unfriendly message like "Problem with Identity Provider The SAML assertion for "givenName" was null. Please contact support." Here's a screenshot:
With the exception of UNC, institutional login isn't working at all. So far we've had attempts from three universities:
In https://help.hmdc.harvard.edu/Ticket/Display.html?id=235496 @donsizemore and I are attempting to get https://dataverse-test.irss.unc.edu/ added to the "Research & Scholarship" category at https://incommon.org/federation/info/all-entity-categories.html#SPs to see if that helps. I say it might help because https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Attribute+Bundle says the following...
Identity providers are encouraged to release the R&S attribute bundle to all R&S service providers:
- Identifiers
- eduPersonPrincipalName
- eduPersonTargetedID
- Mail attribute
- Person name attributes
- displayName
- givenName
- sn (surname)
- Authorization attribute
- eduPersonScopedAffiliation
... which means that all the required attributes would be released to Dataverse.
If it turns out that joining the Research & Scholarship category is enough to get institutional login working for all or most of the InCommon Identity Providers (IdPs) we would probably due well to document this in the Installation Guide. InCommon is US-centric but it can serve as an example.
As I mentioned to @djbrooke, we could improve the error message end users see: http://irclog.iq.harvard.edu/dataverse/2016-06-07#i_36288 . Maybe it could say something like, The institution you are trying to log in with does not provide all the required "attributes" to log in to Dataverse. Please contact the team that manages your institution's Identity Provider (IdP) for assistance."
Longer term, @mcrosas and I have discussed changing the Dataverse code to allow for "initial population of certain fields (if provided), later editable" at https://docs.google.com/document/d/157sw9gaFGwrb0EtCGGlLMxSg0cooHfKEvIsjcebPcA8/edit?usp=sharing but I just realized that this concept isn't listed under potential "phase 2" activities in the Remote Authentication BRD at https://docs.google.com/document/d/1vcAmo2nkFYavAr7OwwXzxM0IFQbkRZYZrrX43q-wqGE/edit?usp=sharing so I just left a comment in the BRD about this idea, which is a non-trivial code change. It's what we want some day. Most users will probably expect that if they just logged in with HarvardKey or whatever that Dataverse will know their email address (this is how login via Google, Facebook, Twitter, GitHub, etc. usually works) and won't have to add this required field manually, but we should be prepared for when Identity Providers (IdPs) by policy don't provide Service Providers (SPs) such as Dataverse their users' name or email address.
Woo hoo! https://dataverse.harvard.edu is now listed as an InCommon Service Provider (SP)! The following screenshot is what you see if you click "Harvard Dataverse" at https://incommon.org/federation/info/all-entities.html#SPs
Here's what you see at https://incommon.org/federation/info/org.html?orgName=Harvard%20College
The next step will be to get https://dataverse.harvard.edu listed under the "Research & Scholarship" category at https://incommon.org/federation/info/all-entity-categories.html because as I mentioned in my last comment, we recently learned that it is highly unlikely the the 400+ InCommon Identity Providers (IdPs) will release the attributes Dataverse requires (name and email, basically).
https://dataverse-test.irss.unc.edu is now listed under "research-and-scholarship" at https://incommon.org/federation/info/all-entity-categories.html#SPs ...
... which resulted in MIT and UIC being able to log in!
Unfortunately, Harvard and Emory users can't log in because neither institution is part of the Research & Scholarship category at https://www.incommon.org/federation/info/all-entity-categories.html#IdPs (only 57 of the 430 InCommon IdPs are part of that category).
I'm keeping track of who can log in at https://docs.google.com/spreadsheets/d/1fWHGamXetTQw3cpf6Tc4efmTvoC97lpEeTLvN7Mxz3g/edit?usp=sharing
Unfortunately, Harvard and Emory users can't log in because neither institution is part of the Research & Scholarship category at https://www.incommon.org/federation/info/all-entity-categories.html#IdPs (only 57 of the 430 InCommon IdPs are part of that category).
@donsizemore figured out how to limit the number of institutions that can log in to https://dataverse-test.irss.unc.edu to just the 57 that are part of the Research & Scholarship category by following https://spaces.internet2.edu/display/InCFederation/Migrating+an+SP+to+Global+Research+and+Scholarship . Of course, that leaves Harvard users and many others out in the cold but it makes no sense to offer a login that won't work due to required attributes not being released.
https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPMetadataProvider#NativeSPMetadataProvider-ChainingMetadataProvider is interesting. The examples shows a URL from a federation and a local XML file:
<MetadataProvider type="Chaining">
<MetadataProvider type="XML" path="partners.xml"/>
<MetadataProvider type="XML" url="https://federation.org/metadata.xml" backingFilePath="fedmetadata.xml"/>
</MetadataProvider>
It also says "With V2.4 and above, this is implied by any configuration with multiple
Thank you @donsizemore for figuring out you can use multiple MetadataProvider
elements.
@donsizemore have been talking a lot about how to handle Identity Providers (IdPs) who do not release attributes that Dataverse requires such as "eppn" (a unique identifier for a user).
He pointed out https://spaces.internet2.edu/display/InCFederation/Error+Handling+Service which is a service InCommon member can use to display a somewhat friendlier error message. For example, if a researcher from the University of Texas at Austin tried to log into https://dataverse.harvard.edu we could put the more descriptive error message at https://ds.incommon.org/FEH/sp-error.html?sp_entityID=https%3A%2F%2Fdataverse.harvard.edu%2Fsp&idp_entityID=https%3A%2F%2Fidp.its.utexas.edu%2Fidp%2Fshibboleth in an iframe or something. Here's how that page looks:
In practice, for now anyway, we plan to prevent researches from the University of Texas at Austin and many other InCommon institutions that are not part of the Research & Scholarship category at https://incommon.org/federation/info/all-entity-categories.html#IdPs (only 60 of 433 IdPs are part as of this writing) from even attempting to log in to https://dataverse.harvard.edu with their institutional credentials by filtering out the 373 institutions that are not part of the Research & Scholarship category.
Woo-hoo! https://demo.dataverse.org and https://beta.dataverse.org were just added as InCommon Service Providers! That means we can work on switching from "Specific Identity Provider(s)" mode to "Identity Federation" mode as documented at http://guides.dataverse.org/en/4.4/installation/shibboleth.html#specific-identity-provider-s-vs-identity-federation
Part of this will be setting up (and probably documenting on the page above via a pull request) how to set up periodic metadata refresh which has been a blocker for @djbrooke getting the Harvard Dataverse servers (production, demo, and beta) added to the Research & Scholarship category at https://incommon.org/federation/info/all-entity-categories.html#SPs
murphy:tmp pdurbin$ curl -s http://md.incommon.org/InCommon/InCommon-metadata.xml | grep dataverse | grep entityID
<EntityDescriptor entityID="https://dataverse-test.irss.unc.edu/shibboleth">
<EntityDescriptor entityID="https://dataverse.unc.edu/shibboleth">
<EntityDescriptor entityID="https://beta.dataverse.org/sp">
<EntityDescriptor entityID="https://dataverse.harvard.edu/sp">
<EntityDescriptor entityID="https://demo.dataverse.org/sp">
I'm using the curl command above because it seems like https://incommon.org/federation/info/all-entities.html#SPs isn't working properly. None of those 5 Dataverse servers appear there, as @donsizemore and I have been discussing at http://irclog.iq.harvard.edu/dataverse/2016-07-26#i_38772
I just reconfigured https://demo.dataverse.org for "Federated Login Mode" and captured the necessary changes into f0f5ac9 (maxRefreshDelay="3600"
which is in seconds, is the key).
Next @djbrooke will be picking up where we left on the R&S form: https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form . The demo site is ready to go. We can now check the box next to "My service refreshes and verifies metadata at least daily" for the demo site.
Eventually, the beta and production sites need to be reconfigured as well.
@djbrooke discussed this issue. I'll take a swing at filling out https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form myself.
For entityID "https://demo.dataverse.org/sp" I filled out https://spaces.internet2.edu/display/InCFederation/Research+and+Scholarship+Application+Form and we are tracking this at https://help.hmdc.harvard.edu/Ticket/Display.html?id=239200
For "How my service supports research and scholarship" I put "Dataverse is open source research data repository software. The Harvard Dataverse is open for all researchers worldwide from all disciplines to deposit data."
I checked the boxes agreeing to https://refeds.org/category/research-and-scholarship and http://www.incommon.org/docs/policies/participationagreement.pdf but someone with more authority such as @djbrooke or @mcrosas should review these documents as well, expecially when we fill out this form for our production service.
That's the demo site. The beta site is in use for usability testing today but @kcondon and I plan to reconfigure it soon so that in the R&S form above we can check the box that says "My service refreshes and verifies metadata at least daily." I'll co-assign this issue to him so some knowledge transfer can take place.
Some coordination will be necessary to enable this in production and meet the refresh requirement without confusing users. InCommon members won't be able to log in until production is part of the R&S category.
Works, closing.
http://guides.dataverse.org/en/4.2.3/installation/shibboleth.html#dataverse-idp-metadata-xml explains how to configure
/etc/shibboleth/dataverse-idp-metadata.xml
to specify one or more Identity Providers (IdPs) that you would like to allow Shibboleth users to log into your installation of Dataverse from. Many Dataverse installations will chose this mode, only allowing users from their own institution to log in.The Harvard Dataverse plans to run in a mode that I'll call "Federated Login Mode" for lack of a better term. This means that rather than configuring an XML file by hand to include a list of approve Identity Providers (IdPs), we will configure Shibboleth to periodically download a list of IdPs approved by InCommon. As of this writing there are 426 InCommon-approved IdPs: https://incommon.org/federation/info/all-entities.html#IdPs
Once the list of hundreds of InCommon Identity Providers (IdPs) is in place I imagine the login page will look something like the login page at https://www.hathitrust.org in the screenshot below:
To do for this issue:
After release: