OpenIDC / mod_auth_openidc

OpenID Certified™ OpenID Connect Relying Party implementation for Apache HTTP Server 2.x
Apache License 2.0
990 stars 327 forks source link

Redirect loops on OSX, but only sometimes. #114

Closed sysadmin1139 closed 8 years ago

sysadmin1139 commented 8 years ago

We're running the 1.8.7 release of this in a proxy setup in front of apps that don't have any authentication built in. We have a very large internal population of OSX machines hitting these apps, and many (but not all) end up with a strange symptom. To reproduce:

  1. Hit the app with a fresh browser (Chrome and Firefox both do this, though the frequency seems to be less with Firefox).
  2. Log in.
  3. Do some things.
  4. Go away for a while. Maybe a long while.
  5. Come back, and get an endless redirect loop.
  6. Remove all cookies for the site.
  7. Hit the app again, no problems.

With browser-tools, it looks like it is ping-ponging between the OIDCRedirectURI and the Google auth URL, with different tokens each time.

From the apache logs, we get a bit more data.

[Wed Jan 27 01:14:37 2016] [warn] [client 10.136.21.164] oidc_session_load_cookie: cookie value possibly corrupted
[Wed Jan 27 01:14:38 2016] [error] [client 10.136.21.164] oidc_crypto_aes_decrypt: EVP_DecryptFinal_ex failed: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt

There is one set of these for each iteration of the redirect loop. Interestingly, our three Linux users almost never see this problem. We haven't tried from mobile, as these sites aren't designed for viewports that small.

Our server is an Ubuntu 12.04 system, and we compiled the library using the stock -dev packages for that release. It compiles fine, and loads; but there is something in the crypto handling that seems to be falling over. But whatever it is, seems to hit the Macs much more often.

Some server configs:

## Google settings.
OIDCProviderIssuer accounts.google.com
OIDCProviderAuthorizationEndpoint https://accounts.google.com/o/oauth2/auth
OIDCProviderTokenEndpoint https://accounts.google.com/o/oauth2/token
OIDCProviderTokenEndpointAuth client_secret_post
OIDCProviderUserInfoEndpoint https://www.googleapis.com/plus/v1/people/me/openIdConnect
OIDCProviderJwksUri https://www.googleapis.com/oauth2/v2/certs
## Credentials and settings from the google developer console.

OIDCClientID [a client ID]
OIDCClientSecret [a secret]

OIDCCookiePath /
## Use cookie, rather than having to share state across servers via a SAN or memcache.
OIDCSessionType client-cookie
## This must be the same across servers.
OIDCCryptoPassphrase [A passphrase]
## See https://developers.google.com/accounts/docs/OAuth2Login#authenticationuriparameters
OIDCAuthRequestParams hd=example.com

## Set REMOTE_USER to the email address.
OIDCRemoteUserClaim email
OIDCScope "email"
#########################
### /openidc

With the Vhost configured with:

# OpenID Connect via Google - domain-specific redirectURI and cookieDomain
OIDCRedirectURI https://app.example.com/oidcoauth2
# The redirect URI needs to be able to set this cookies on this domain.
OIDCCookieDomain app.example.com
# Set the inactivity timeout to something big.
OIDCSessionInactivityTimeout 32400

  <Location "/">
    AuthType openid-connect
    AuthName "MonApp"
    Require claim hd:example.com
  </Location>
zandbelt commented 8 years ago

Hmm, I don't really have a clue where to start. Sounds like the cookie should have just expired and the authentication should have been re-started and succeed.

The error actually indicates that the cookie is corrupted (the decryption failed) and I fail to see how that relates to expiry or specific browsers at all. Even more so: since the authentication was triggered again, a new cookie should have been set and that should be fresh and valid.

It would be convenient to turn on debug logging and upload the apache error log (isolated to single browser journey) and a browser trace (of that single browser).

zandbelt commented 8 years ago

still interested in the data above but: since you use client-side storage, it may be the case that the cookie size just gets too large and the value gets truncated; that would explain the error (although not the fact that it happens after expiry)

so I'm interested in earlier WARN log entries as well

sysadmin1139 commented 8 years ago

I'm wait for one of my affected users to trip over it. It's a sometimes-thing, so I have to stalk this error. I have a debug dump of a succesfull session to compare against. I just need a bad one. But here are some earlier warn entries:

[Wed Jan 27 01:11:33 2016] [warn] [client 10.144.86.111] oidc_session_load_22: session restored from cache has expired
[Wed Jan 27 01:13:46 2016] [warn] [client 10.144.86.111] oidc_session_load_22: session restored from cache has expired
[Wed Jan 27 01:13:50 2016] [error] [client 10.144.86.111] oidc_crypto_aes_decrypt: EVP_DecryptFinal_ex failed: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt
[Wed Jan 27 01:13:50 2016] [error] [client 10.144.86.111] oidc_base64url_decode_decrypt_string: oidc_crypto_aes_decrypt failed
[Wed Jan 27 01:13:50 2016] [warn] [client 10.144.86.111] oidc_session_load_cookie: cookie value possibly corrupted
[Wed Jan 27 01:13:50 2016] [error] [client 10.144.86.111] oidc_crypto_aes_decrypt: EVP_DecryptFinal_ex failed: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt

Not a lot there. Your idea about us overflowing the internal cookie size is a good one, and I'll look into it. My debug tripwire is on a service that shouldn't make a lot of cookies.

zandbelt commented 8 years ago

I was hoping to see a warning like this:

the length of the cookie value (%lu) is greater than 4093(!) bytes, this may not work with all browsers/server combinations: consider switching to a server side caching!

when setting the cookie after succesful login; if that is not the case we'll need to look for something else

sysadmin1139 commented 8 years ago

I finally got a bum session. Unfortunately, it came in the middle of a flurry, so I was not able to get a pure single-browser stream. This is from around the spot of the EVP errors, and I'll try to further extract relevant lines.

oauth-fail.txt oauth-fail-2.txt

zandbelt commented 8 years ago

it is probably worth looking for the log entry that sets the cookie value that seems to be corrupted and see if it is truncated; I'd also appreciate it if you could send the full log directly to me

zandbelt commented 8 years ago

to find out if cookie size is the problem, I've created a new branch here: https://github.com/pingidentity/mod_auth_openidc/tree/cookie-size-error

that code errors out when the cookie size exceeds a maximum acceptable size, instead of producing a warning in the server log but continue the process; if the cookie size is your problem because it gets truncated, this will at least fix the looping and will produce a user facing error

could you please test this and provide feedback? if this "fixes" the problem you'll probably have to revert to server side session storage or reduce the session size (e.g. by not calling the user info endpoint for extra claims, at least if you can do without these claims)

sysadmin1139 commented 8 years ago

A diagram from the draft-standard:


     +--------+                               +---------------+
     |        |--(A)- Authorization Request ->|   Resource    |
     |        |                               |     Owner     |
     |        |<-(B)-- Authorization Grant ---|               |
     |        |                               +---------------+
     |        |
     |        |                               +---------------+
     |        |--(C)-- Authorization Grant -->| Authorization |
     | Client |                               |     Server    |
     |        |<-(D)----- Access Token -------|               |
     |        |                               +---------------+
     |        |
     |        |                               +---------------+
     |        |--(E)----- Access Token ------>|    Resource   |
     |        |                               |     Server    |
     |        |<-(F)--- Protected Resource ---|               |
     +--------+                               +---------------+

Judging by the browser-traces, the problem seems to be happening in step E. The client is submitting the access-token, and the token is being rejected by the server for failure to decrypt. We then go back to C.

I'll try your branch, thanks for providing it!

zandbelt commented 8 years ago

for the record: that diagram is from the OAuth 2.0 spec, but the protocol you're using is OpenID Connect, which is a bit different (albeit an extension of OAuth 2.0); a more appropriate diagram is here: http://openid.net/specs/openid-connect-core-1_0.html#Overview

furthermore the decryption is about the session cookie that is specific for the module, it is not part of the specification or protocol flow itself

sysadmin1139 commented 8 years ago

It looks like there is a size mismatch. In my capture of a bad session, there is a oidc_util_set_cookie, which sets a cookie at 3500 characters (plus some more for path and flags, not included in the count). A little later the same cookie is read by oidc_util_get_cookie, and that string is 2689 characters long. The strings are completely different.

In my capture of a good session, the set gives a cookie at 3479 characters long, and is later get at 3479.

zandbelt commented 8 years ago

Are you sure they are from a single browser? The cookie should be just truncated, not garbled.

sysadmin1139 commented 8 years ago

I'm running the new build, but haven't had any new errors show up. I have notice one thing, though. The bad cookies all start with this string:

QTZsnXo9wEAp3bNlC97QThq2hs_W0TjS2l7FazW8z0WkzWCNJBtA5mYwSXKWL1w9NFF3qX5rEyNswWf_cb2QM

The cookies being set all start with:

mBUiFhI9RanW1FZ6cJC0tw4frFRPY16Y2sr6tkQGPAD2EmVgoZsz3iebEgZ1A6CXm_f5FhCrSWkkagymlZHZt

The gets that seem to work all start with the same string. Both strings being identical length suggests to me that there is a plain-text that is identical to the first 86 characters, but a different stream-cipher key is being used for the bad cookies.

dwaite commented 8 years ago

Are there any other cookies? The browser limit for all cookies sent is 4093 bytes

sysadmin1139 commented 8 years ago

There are quite a few cookies being set. One of the applications we're putting behind it is Elastic.co's Kibana3, which sets a variety of cookies in addition to the ones set by oauth itself.

k3-cookies

None of them are nearly as big as the mod_auth_openidc_session cookie, but they all add up.

zandbelt commented 8 years ago

thanks DW, seems that's it: even though the size of mod_auth_openidc's session cookie is still well under the acceptable limit, the overall size of all cookies sent to the server (in a single cookie header) is over it and the session cookie gets truncated

seems that you have to revert to a server side session storage mechanism, see https://github.com/pingidentity/mod_auth_openidc/wiki/Caching or reduce the size of the session cookie by not calling the user info endpoint (assuming that you don't need these claims in your app); you can do the latter by commenting out OIDCProviderUserInfoEndpoint

if you can confirm that the looping is gone, I'll probably merge this branch in the master anyway

zandbelt commented 8 years ago

any update on this? FWIW: it would be better to replace all of the configuration primitives in the ## Google settings. section with the following statement:

OIDCProviderMetadataURL https://accounts.google.com/.well-known/openid-configuration
sysadmin1139 commented 8 years ago

I, er, found out last night that the OIDC settings I put into place were being bypassed completely on the busier server. So I don't yet know if the redirect-problem has been resolved with moving to server-side state tracking. Will find out today.

I'll cross-check the OIDCProviderMetadata settings, too.

sysadmin1139 commented 8 years ago

So far, two complaints of redirect-loops. But those are likely attributable to mega-cookies already in place. We've cleared those out. If I get to close-of-business tomorrow without any complaints, I think we're out of the woods.

sysadmin1139 commented 8 years ago

We had a clear fail just a bit ago. On a mac, using Chrome, and dev-tools showed no mega-cookie lurking in the background. I have a debug-trace of it.

single-session-bad.zip

Our config has changed a little since you saw it last.

########################
### openidc
#########################
LoadModule auth_openidc_module /usr/lib/apache2/modules/mod_auth_openidc.so

## Google settings.
OIDCProviderMetadataUrl https://accounts.google.com/.well-known/openid-configuration
## Credentials and settings from the google developer console.

And

## Use session-server. The cookie method was SO BIG it was blowing things up.
## See https://github.com/pingidentity/mod_auth_openidc/wiki/Caching
OIDCSessionType server-cache
OIDCCacheType shm
OIDCCacheShmMax 900

Just the metadataurl and server-side session-tracking changes. The rest remains as-is.

zandbelt commented 8 years ago

the logs show there is still a mega cookie lurking around that eclipses the new cookie that would have been set, e.g.:

[Wed Feb 03 22:28:54 2016] [debug] src/util.c(732): [client 10.144.86.111] oidc_util_set_cookie: adding outgoing header: Set-Cookie: mod_auth_openidc_session=f71e93fe-a870-45f8-a9d0-ded39d7fa3a6;Path=/;Domain=alert.example.com;Secure;HttpOnly
[Wed Feb 03 22:28:54 2016] [debug] src/cache/shm.c(213): [client 10.144.86.111] oidc_cache_shm_set: enter, section="session", key="f71e93fe-a870-45f8-a9d0-ded39d7fa3a6", value size=%lu
[Wed Feb 03 22:28:54 2016] [debug] src/mod_auth_openidc.c(1297): [client 10.144.86.111] oidc_handle_authorization_response: session created and stored, redirecting to original URL: https://alert.example.com:443/
[Wed Feb 03 22:28:54 2016] [debug] src/mod_auth_openidc.c(2355): [client 10.144.86.111] oidc_check_user_id: incoming request: "/?(null)", ap_is_initial_req(r)=1
[Wed Feb 03 22:28:54 2016] [debug] src/util.c(772): [client 10.144.86.111] oidc_util_get_cookie: returning "mod_auth_openidc_session" = "QTZsnXo9wEAp3bNlC97QThq2hs_W0TjS2l7FazW8z0WkzWCNJBtA5mYwSXKWL1w9NFF3qX5rEyNswWf_cb2QMsZxYtEsiMpH6iQYsVAVuVwwYyW6G0GpdEU3KRQlxtPl5Vxj7tqDLTsdUrVLnlQuRPLEamhLdtcvstFQdi59FLSzc2k9o4oVQrpoc3Bpi4BAXSHyMTcFI0f07KVheUyxgCvhUUEoeeYYPz3q5Sah50O_2pssM4D9JmwvmuQz2emOvXrPlQYM3OX5uZoMgbZrfSReSQHmANYCboTmQJ03noaZtE51glPDB7dm0Uzdora2AVxprxHILKvhyaucEQbFxhVZ2edIhxQR6GCoIMWuU7MiE3KdzFIt6vpOuay-X0FDx_GES1SiQHgFqpvkftaqThZWbYxcRaHaD0LoYCyJ77cIzA6weQRqhYpFRzyMdRtt3ws1nEFqxT6BQNjQtPRRsy1aAO7H_0HSOVaEwc6bYJNFmX2IxJyZlQ9gJ0GCdcLqYJht7V5SwOlRUCR54EWEK5DOtSYhC18ly_mxS4X9qFL2AWYASaslGEp0KMTm7zqAfZ9AQ3duQwfOP3HfzUnqzot-zgTixPmr9wiAkBFlJLx2AsCwjicASnrBmc0eqIvrYOPzcawAUC5gE55BLzuaO_OyfWMaqc9E-RGLDULher7O1Hp1rGI3rgSQlQCeBUIGAywgd_oZr_gOfG0awPTR3StsQ3IcMJ50DkJMHSt0AXA41834nkQo6PJRFdplqS09RqcBqSNkba4eH-ozC2hwt9CS-7gOMpcofR3qA0r_nTlZ2fv1VyX9_n9dr5MEuGJeVFXM0MNzU2m-yfGJj_nqdQEv-T_KnGVVSwIOl7YsLc4nFUbqjPxvtU4-wArbXfaq2ctJAqD2HnAwiyBLD1Vt3xLWFUS_TAhDB7B-EhlpYatHVoGOO25IJgruhZKWJGu1ySvP0SRN8xGgaIavaTyWUknhJxXEcYT89caEZBFV4yqJocySbj_nFo_8HnC-_vY61WtA9H5LcSy58fTBXKLVAyosq3iNe_nFKk64l5xvYK78UpiMfH-jdQk2uiBwhotHciE-PKtCGXDDcWD9J4qYEQRRbMcXsGy9iKdQjc5AXcydYWGZziXFUA0QQhRvN9bEW6XATfjegtsoxHxyrSooYgmfkGNt-qKMw-Mr1qEuIH-TFKOTy0XWxHz4Af6DyhlT9QsvgzLOJz2wAR96FyUhMq-6qm0zI9FWVg27NiB9atWN3ZxZxe2eP6yPbLm4GFtHZgATdXXKo4sNXU0lQ_XlArxTLstCusMcUv5L2yMghIDhdU32Qu-pclc8i8czfPLIcx1uc0kIsp00xgNwCz1zdLyxL1zqgx59Oj8Ho7_jmtHBgQxg4qNAsF3qR_Psx79gagndSHa2YnGicujOtySIFNx8DLEuU5fgOWO57AMUFZXQino3ux5IdPqVkYJzKVwJeMhOWfCmkjRKhQY9oOsnXTi7yUQeRx2L9SrkMbR3b9D2iwFDVScRwmBU-kkfNHv21pWPlSaTO_8ZFVDVJgKrABpZCScuV539bwKoqndepOPWwg5009vSEFLLhpcIWt7XLwmcVkk9kKlTbd-NGWedidEVzH_mGKJgI-HJvz735jxyM5pLvuGi56B-jiU7w_J1tGNB935nGKI_KwQ6JbTzbfHeyTp_6N4Jhc86FhSW7RfikkSkcEyOsODFSmGRLRyEHm1oxks3pskKmfFg4omKvdw9sN1DBAqYyFhC_cjKRpfzjxUIxrtK2SYAG3awxZr9Qi1gzCNFbLsjqkw_Ho-7LbzjIkh1dCensn8yHrl4DWC1J1NyQ8qgOUG8unjXBFVh0531j9763TfowBkkPay53zy0poGjS4ynsSxmhRxEe_AdH28zW4UJuxIvrp1L6gQJN34eXoyJXu1YvTOIa1NJVXG03RQc5KcKBq1VY_gzljycnIIziZmKUhA2Gg0LbZ8101e309RrE2IL9ox1yzXqLWxSD2DVhKvCycRC4F0Kh_8bxH8NeLFhvdMzUMluto_Nj1nGmzbM7RtTXDbAGdeyhZGhMJKIZXTVTlcXXGcgusF60jNNhJrcvE27Uf-BsOmBKQNYRVx90S3Y6nqZAIryOkCSbEWCTxLr3o8WvWuvn9hzRqgzjBmBp6prWV8cGMlTqcFhslD08uQOlMuhJrYlzT5twjCirqp5_3HPF5P8ZFtljUtXfjXs-qtnKJTeDfIxBU-Xv8ss9g1ilVxX9djNsuIM059GF-fROGfyHLcnmjLcBqgOVUHF6-S3PNi8B_QdSgzp6xf5_87kNwoXmkpOrEhfI5dknRCutRQcDVdFdHOq3ZoH_1sgvwH98Om2y-RQkBx77f5wAaqvs69ZQ665oEgNUsaCUY_UiM_Ma8Td3TELh6U3HjjrrGNUmuYYU92tayrHY-i5kjGfXkwF6vKwKC2c1-kzJ34JUNXmBG4ARDQwdhsapziWn9RWF_Csj2ug3slS2IW0H_fILe-eNwPF0JDvJ_gGhlq61fQ-Tm_P6C4naAwml9OtbwKzbXOcp-_88lEIEecC5n1HeUz4xe6QtsZeoPrKWINYWLhppsE5t9bPyg3rIgqnRHqciQojxSMaRquA_GQYenH0YHrXvI1FQOqLU99iTpYE4Tkh7ZLqD80rT4KJX3gWH8YuSFBcWTdceSUvUd-Bg8FLZgBHPbJ5kINylZ65GTE11bSMfNJkgBB40Zz6vjnf91BdVEjQT3bJ1bQIA8XTdsizyu1aRdN5Fg"

cleaning the cookies or restarting the browser will solve this

sysadmin1139 commented 8 years ago

Here is the funny thing, since I noticed that too. I asked this affected user to check their cookies in Chrome's development console, and they did not see an overly large cookie.

What I will try, because I can do that, is set up this server on a new URL and have them try it. If it works, it's proof that there is some lingering state that needs to be purged with more fire than usual.

zandbelt commented 8 years ago

since it is a session cookie, restarting the browser would fix it

sysadmin1139 commented 8 years ago

Browser restarts still gave us re-direct loops in some things, so I went ahead and made a new alias for it to try and kick those cookies out of scope. On that brand new FQDN, that no one had ever hit before, we got redirect-loops on Chrome+OSX.

zandbelt commented 8 years ago

can you paste a similar log for that new fqdn to exclude the cookie size issue?

sysadmin1139 commented 8 years ago

I captured a bad session.

bad_session.zip

The user claims he got a redirect-loop, but I'm not certain the logs show that. They look much different than it did with the cookie issue. Besides, these show up now:

 [Tue Feb 09 01:27:09 2016] [debug] src/authz.c(290): [client 10.165.20.178] oidc_authz_worker: require claim 'hd:example.com' matched, referer: https://kib.example.com/

And show up for a lot of the assets that are loaded on that page. To me, this reads like it's working.

zandbelt commented 8 years ago

The log shows successful completion of the module indeed for many Javascript resources that get loaded. It may be that the application is actually putting the browser in a loop, that is something you could check (I see / gets loaded twice in that log).

Anyway, I'm closing this ticket since the original (decryption) problem was due to cookie size limits that were exceeded when using client side storage; I've added a warning about that in the config file https://github.com/pingidentity/mod_auth_openidc/commit/f1a66edddc21ac40dbc181c5c3ec6453bbd23eaa

Feel free to open a different issue if you believe you've encountered a different problem, or re-open this ticket if that problem somehow would still be related to this one.

zandbelt commented 8 years ago

more information here: https://github.com/pingidentity/mod_auth_openidc/wiki/Cookies

sysadmin1139 commented 8 years ago

Beautiful. I'm glad our pain is helping.