GluuFederation / oxShibboleth

Shibboleth project for the Gluu Server's SAML IDP functionality.
MIT License
4 stars 4 forks source link

Some SAML flows will fail when several tabs of the same browser window initiate them in a quick succession/simultaneously #41

Open aliaksander-samuseu opened 6 years ago

aliaksander-samuseu commented 6 years ago

Environment:

CentOS 7.4, gluu-server-3.1.2-1-4.centos7

Preconditions:

  1. A vm with a Gluu Server instance running

  2. A vm with two SAML2-able SPs configured to use the Gluu Server for SSO (proper tests have been conducted proving the TRs are functional)

  3. A web browser which comes with support of "Reload all tabs" or similar feature OOTB ; both SPs sign-in urls (the ones which instantly trigger SAML SSO flow on following them) opened in the same browser window, in different tabs; no "Incognito" or similar mode must be enabled. For this test I would recommend to use Firefox.

  4. [Optionally] A custom authentication method is configured at Gluu which has a pause waiting for user action during its flow and sends user to a 3rd party for authentication. In my case I used Passport-SAML setup and an another vm with a Gluu Server playing a role of remote IDP; Passport-SAML flow, when no IDP's id is sent in state, presents a remote IDP selection page, thus pauses execution awaiting user's input.

Steps to reproduce:

In case you have setup conforming to the item 4) of "Preconditions" (all steps needs to be done relatively quick, otherwise some expiration timer may run out):

  1. Make sure no session does exist at SPs or IDPs (clear all cookies)

  2. Initiate SAML flow at the first SP. Wait until you'll be redirected to the oxAuth/Passport script login page and presented with a list of possible remote IDPs, but don't select anything just yet

  3. Switch to the 2nd tab and initiate SAML flow at the second SP. Wait until you'll be redirected to the oxAuth/Passport script login page and presented with a list of possible remote IDPs, don't select anything yet again

  4. Switch to the 1st tab again, and now select your remote IDP from the list. Log in at remote IDP if needed, and wait until the flow will complete successfully (you should be taken back to your 1st SP's protected page)

  5. Switch to the 2nd tab now where flow is still hanging at IDP selection page and select the same remote IDP as in step 4)

If it's too troublesome to create the complete Inbound SAML setup, it's possible to mimic required conditions with the following trick (it also presents another variation of triggering this issue at the same time, showing how else it may degrade user's experience):

  1. Make sure no session does exist at SPs or IDPs (clear all cookies)

  2. Make sure you have a separate browser window with 2 tabs containing protected pages of your 2 test SPs (so that by clicking "Reload all tabs" you could be sure they will start signing in simultaneously). It's very important to make sure those pages have been opened before, as some browsers may react unexpectedly if you just open an empty tab and will put url in there, than hit "Reload" button - it may simply not work, or some previous page will be loaded instead, if any. Firefox show this very behaviour.

  3. Log in to oxTrust of the Gluu Server used for the test. This will ensure you'll have a session at oxAuth, but still you won't have session at IDP yet (thus it creates the required conditions for issue to manifest itself, as it hides somewhere in the interconnection between oxAuth and IDP/RemoteUser login handler)

  4. Reload all tabs simultaneously. If your chosen browser doesn't have this option, I've noted that sometimes you may achieve the same result if you'll manage to find a way to do it manually as well, by very quickly switching to the 2nd tab and reloading it with "Crtr+R" or F5

Results:

In the case when Inbound SAML is used as described, the 2nd flow will fail after browser is redirected back to IDP's /RemoteUser callback endpoint from oxAuth, and IDP's error page is displayed. In the case when the other approach is used, one of the flows will fail (apparently they are in a race condition state and whichever is processed first will succeed, the other will fail). In both cases, the error message is the same. Here is the full log for the Inbound SAML case, below is the most relevant part which is registered at the time when IDP's error page is displayed:

2018-03-22 14:03:13,659 - INFO [org.gluu.oxauth.client.validation.OAuthValidationFilter:149] - Session validation successful. User is logged in
2018-03-22 14:03:13,722 - ERROR [net.shibboleth.ext.spring.error.ErrorRaisingController:55] - Propagating exception thrown by request to /idp/Authn/RemoteUser
2018-03-22 14:03:13,723 - ERROR [net.shibboleth.idp.authn.ExternalAuthenticationException:76] - 
net.shibboleth.idp.authn.ExternalAuthenticationException: No conversation state found in session for key (e2s1)
    at net.shibboleth.idp.authn.ExternalAuthentication.startExternalAuthentication(ExternalAuthentication.java:142)

Apparently it has something to do with how IDP or our customized RemoteUser handler handles stale auth requests/repsonses/sessions. After the first request is fully processed, a subsequent ones which were initiated very soon after it, but which responses are already "late" when they reach Gluu instance in question, are being dropped with no mitigation procedure in mind, possibly resulting in a bunch of tabs "stuck" in different erroneous states (there was a report from a customer who was inconvenienced by it)

Expected results:

If several pages were reloaded or loaded at once, initiating a bunch of (almost) simultaneous SAML signin flows (which is a common case when a browser is launched and a previous session is restored, or a "Reload all" button is used etc), after the very first request resulted in sessions created at oxAuth and IDP, when other (now stale) response from 3rd party services used for authentication will reach oxAuth/IDP, they should be silently discarded, and user's flow should be continued as if he was successfully authenticated in the end, returning him to the intended SP's which sent him here. Caution is required when choosing a new design here as sub-optimal handling stale requests/responses may lead to security breaches!

P.S. I've also noted there is a proposal for RemoteUser handler's re-design. If this work already has begun, it seems like a perfect opportunity to make sure the new design also considers the issue described here.

aliaksander-samuseu commented 6 years ago

@dmogn @yurem

We also would really need a backport of this fix to 2.4.4 codebase (a customer wants the fix for their live 2.4.4 setup). Reproduction steps for this package are effectively the same (I used Asimba setup there instead of Passport-SAML). Logged messages are different there, though similar in nature, full log is here, a short excerpt is below:

06:08:57.368 - DEBUG [edu.internet2.middleware.shibboleth.idp.authn.AuthenticationEngine:144] - Returning control to authentication engine
06:08:57.368 - DEBUG [edu.internet2.middleware.shibboleth.idp.util.HttpServletHelper:339] - LoginContext key cookie was not present in request
06:08:57.368 - WARN [edu.internet2.middleware.shibboleth.idp.authn.AuthenticationEngine:147] - No login context available, unable to return to authentication engine
06:08:57.370 - DEBUG [edu.internet2.middleware.shibboleth.idp.util.HttpServletHelper:339] - LoginContext key cookie was not present in request
06:08:57.370 - DEBUG [edu.internet2.middleware.shibboleth.idp.ui.ServiceContactTag:177] - No relying party, nothing to display
yurem commented 6 years ago

I've changed issue type from "bug" -> "enhancement". I'm not sure that is really Gluu issue. We not changed Shibboleth IDP code. We are using their binaries. During packaging we only add Filters to intercept RemoteUser endpoint calls to allow us redirect to oxAuth for authentication.

Hence before start to work on this issue we need to test if SP + Shibboleth IDP without Gluu. If ther flow the same we can offer to open Shibboleth issue.

@aliaksander-samuseu can you try to reproduce this issue with SP + Shibboleth IDP without Gluu changes.

yurem commented 5 years ago

@aliaksander-samuseu did you tried SP + Shibboleth IDP without Gluu to reproduce this issue?

yurem commented 5 years ago

@aliaksander-samuseu I've changed milestone to 4.0. Can you try to prepare answer on my previous question? Without it we can't make decision if this is Gluu issue or not

uprightech commented 2 years ago

Hello @aliaksander-samuseu , with the new flow , the issue seems not to be replicable. Could you re-run the tests on your end so we can close this once and for all ?