goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.2k stars 4.77k forks source link

State mismatch when login via oidc #12982

Open Leo-ljr opened 4 years ago

Leo-ljr commented 4 years ago

Hi,

I get an error when I try to connect to Harbor with an oidc (Google).

It seems that there is not state value in session. I don't know why. Logs:

[ERROR] [/core/controllers/oidc.go:76]: State mismatch, in session: %!s(<nil>), in url: 2pG0sDHtGx32UgFPRlDCag8CVdR17Qkf

I was able to connect for a few hours, for no reason I can't anymore. There was no restart or change of configuration.

Versions:

reasonerjt commented 4 years ago

@Leo-ljr Could you let me know how did you install Harbor? Are you using the internal Redis?

The state was written into session but based on the issues I do think this approach has some problem.
I never reproduced this issue in my dev-enviroment.

By I can't anymore do you mean you can't login OIDC even after you refresh the login page?

Leo-ljr commented 4 years ago

@reasonerjt

I use Helm for installation and yes it's the internal Redis.

I can reload or go in incognito mode it's the same error. I can only login one time per 24h (approximately)

{"errors":[{"code":"BAD_REQUEST","message":"State mismatch"}]}
reasonerjt commented 4 years ago

@Leo-ljr I'll do more test later this week, however, if you do NOT go to incognito mode, does it work?

Leo-ljr commented 4 years ago

@reasonerjt Nope, doesn't work

reasonerjt commented 4 years ago

Have you had a chance to try the rc1 offline installer? And how do you expose Harbor via the helm chart? Is there extra load balancer? I'm thinking, some of header may be filtered in the redirection back and force, so the session id is lost.

If you turn switch the log level to debug by updating the env var of harbor-core, can you see the message the state being written to session? @Leo-ljr

lz006 commented 4 years ago

same here, it stopped working without any change after a while.

our setup:

Leo-ljr commented 4 years ago

@reasonerjt Use offline installer, it's a lot of work I use the clusterIp service with Haproxy in front But for example the OIDC of Gitlab work perfectly in the same cluster

In the debug mode, this is what I see:

020-09-07T15:54:41Z [DEBUG] [/server/middleware/log/log.go:30]: attach request id 1bd6f14d-0c3a-451c-8ea8-64caf22f50c1 to the logger for the request GET /c/oidc/callback
2020-09-07T15:54:41Z [DEBUG] [/server/middleware/artifactinfo/artifact_info.go:52]: In artifact info middleware, url: /c/oidc/callback?state=2DzzDsrSPmcghah3gaucei1dzrghy&code=4/3wFzAN8nIpVjfjXkn1xShHOUnOGMRhM_ziebahxae6ku9shahx0IiFc7uqQtdqukNluaZTevIfS4HV_5eyahMbs&scope=email%20profile%20openid%20https://www.googleapis.com/auth/userinfo.profile%20https://www.googleapis.com/auth/userinfo.email&authuser=0&hj=onche.com&prompt=consent
2020-09-07T15:54:41Z [DEBUG] [/server/middleware/security/unauthorized.go:29][requestID="1bd6f14d-aef9-451c-5eed-6reff22f50c1"]: an unauthorized security context generated for request GET /c/oidc/callback
2020-09-07T15:54:41Z [ERROR] [/core/controllers/oidc.go:76]: State mismatch, in session: %!s(<nil>), in url: 2DzzDswSPmcjq3gOGa4gC0rdQ2YIXYbJ
2020-09-07T15:54:41Z [DEBUG] [/lib/http/error.go:59]: {"errors":[{"code":"BAD_REQUEST","message":"State mismatch"}]}
2020/09/07 15:54:41.214 [D] [middleware.go:52]  |   10.244.112.0| 400 |  10.787665ms|   match| GET      /c/oidc/callback   r:/c/oidc/callback
reasonerjt commented 4 years ago

@Leo-ljr I don't think the issue with state has anything to do with what OIDC provider you used, when you click the button and be redirected to the external login page, there should be some log message like

State dumped to session: ...

Could you compare this output with different OIDC providers?

Leo-ljr commented 4 years ago

@reasonerjt Sorry I have this request before

2020-09-08T07:40:33Z [DEBUG] [/core/controllers/oidc.go:67]: State dumped to session: sESnqz7IIZnQ5YasOvPB5S5rxTnMVFDu
lz006 commented 4 years ago

It seems that method "oc.SetSession(stateKey, state)" or "oc.GetSession(stateKey)" does not "hold" the state correctly. As we can see, states are the same but prior stored state cannot be retrieved for comparison (). Maybe a bug in "github.com/astaxie/beego/session" ?

[DEBUG] [/core/controllers/oidc.go:67]: State dumped to session: PeghznlegvWZ7N7OkkwR9tBwBjp0XJDf
[ERROR] [/core/controllers/oidc.go:76]: State mismatch, in session: %!s(<nil>), in url: PeghznlegvWZ7N7OkkwR9tBwBjp0XJDf
reasonerjt commented 4 years ago

@lz006 @Leo-ljr What you browser are you using? Is it Chrome?

Could you use the developer tools to check the requests to/c/oidc/login and /c/oidc/callback, has the cookie sid changed?

Are you using the same hostname to access the login page and in redirect URL of the OIDC provider?

lz006 commented 4 years ago

I've tested it on both firefox and chrome (win10 / debian10 desktop)

Looks like sid isn't the root cause sid's: /c/oidc/login 27822614fe34cb46cc356aa8373333b9 /c/oidc/callback 27822614fe34cb46cc356aa8373333b9

reasonerjt commented 4 years ago

@lz006 Are you saying that you can never login via OIDC, i.e. after redirected to /c/callback URI you always see state mismatch?

To be honest, since I cannot reproduce the issue, it's seems a mystery to me, under the hood the state was written to session, and after that the when user is redirected from OIDC endpoint to Harbor after successful authentication. If sid is not changed, the state written to the session should be loaded.

lz006 commented 4 years ago

@reasonerjt unfortunately yes. But like I said, immediately after rollout of harbor via helm it was working. This behavior occured after a while (several hours / days... cannot be more precise). And it is 100% same environment when we were running harbor 1.x. Other applications hanging on our oidc provider (keycloak) still behaving as usual. This problem came up after upgrading harbor.

Leo-ljr commented 4 years ago

@reasonerjt For me, I only see the sid cookie in /c/oidc/login The /c/oidc/callback is a error 400 like in my previous logs

But in my tests, I was able to connect once. After a disconnection I can't again When I was logged in, I saw the sid cookie in /c/oidc/callback

I specify that I did not change anything between the two connections.

Leo-ljr commented 4 years ago

@reasonerjt I know how I was able to get connected.

I don't know why, but this is not normal

lz006 commented 4 years ago

@reasonerjt

@Leo-ljr was right.

I can reproduce this strange behavior as well! (It's important to open 2 tabs with the harbor login page without having a valid session before hand)

reasonerjt commented 4 years ago

@Leo-ljr

For me, I only see the sid cookie in /c/oidc/login
The /c/oidc/callback is a error 400 like in my previous logs

Do you mean the request to /c/oidc/callback does not carry the cookie? It seems to me this may be related to the setting in haproxy that it filtered the cookie

@lz006 how did you install Harbor and have you setup a proxy in front ?

lz006 commented 4 years ago

@reasonerjt installed via helm, yes there is a haproxy sitting in front of our k8s cluster. But as mentioned same env worked when using 1.x harbor.

reasonerjt commented 4 years ago

@lz006 Could you please see my conversation with @Leo-ljr and try to debug if the cookie is dropped by haproxy?

Leo-ljr commented 4 years ago

@reasonerjt After more tests, I didn't see the sid token in /c/oidc/login and /c/oidc/callback But when I'm logging in with this

Open 2 tabs with the Harbor login page
Login in a tab with the admin user
Then in the other tab I can now connect via OIDC without error

All is well I see the sid in /c/oidc/login and /c/oidc/callback

I don't think the problem is HAproxy, otherwise wouldn't see the cookie

reasonerjt commented 4 years ago

@Leo-ljr You only see cookie when the Set-Cookie header is passed to browser.

I think you can check why the Set-Cookie header is not passed to browser. For example was it passed to haproxy? did haproxy drop this header in response for some reason?

I may be wrong to think the root cause is haproxy but I don't reproduce your problem and the only difference in your env is the haproxy. So I suggest we start by doing an investigation at haproxy level.

dioguerra commented 4 years ago

e, this is what I se

I have the exact same problem. I'm using the expose.type = ingress.

I know this is a 2.1.0 change because if i use the chart version 1.4.2 this does not happen.

dioguerra commented 4 years ago

@lz006 Are you saying that you can never login via OIDC, i.e. after redirected to /c/callback URI you always see state mismatch?

To be honest, since I cannot reproduce the issue, it's seems a mystery to me, under the hood the state was written to session, and after that the when user is redirected from OIDC endpoint to Harbor after successful authentication. If sid is not changed, the state written to the session should be loaded.

I never managed to login...

reasonerjt commented 4 years ago

@MrD2 please see the comment above and do the debugging.

dioguerra commented 4 years ago

I traced my problem back to "I forgot to update the container tags when I bumped version"

So, silly mistake. Thanks anyway

Leo-ljr commented 4 years ago

@reasonerjt I have the same error with a kubectl port-forward on harbor-nginx pod, this is definitely not HAProxy the problem

mrlioncub commented 4 years ago

Same problem in docker-compose.

With HTTPS Reverse Proxy (nginx):

server {
  listen 443 ssl;
  server_name reg.domain.com;

  client_max_body_size 0;
  chunked_transfer_encoding on;

  location / {
    proxy_set_header Host                $http_host;
    proxy_set_header X-Real-IP           $remote_addr;
    proxy_set_header X-Forwarded-Ssl     on;
    proxy_set_header X-Forwarded-For     $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto   $scheme;

    proxy_pass http://harbor-nginx:8080;

    proxy_buffering off;
    proxy_request_buffering off;
    proxy_http_version 1.1;
  }

  ssl_protocols SSLv3 TLSv1.2 TLSv1.3;
  ssl_certificate   /ssl/site.crt;
  ssl_certificate_key   /ssl/site.key;
  ssl_dhparam       /ssl/dhparam.pem;
}
mdekoster commented 4 years ago

We have experience the exact same behaviour with the 2.1.0 version of Harbor with keycloak as OIDC provider. Our users use various browsers (Chrome, Firefox, Edge, ..). The ingress is via a HAproxy and a NGINX ingress controller. We use the Redis that is installed with the Harbor Helm Chart.

lz006 commented 4 years ago

For me it seems this must be somehow related to nginx. Because our problems disappear when we bypass nginx ingress controller using "nodePort" option of kubernetes. It's not how it's meant to be, at least we can use our keycloak based sso again.

Our working setup: reverse proxy (haproxy) -> k8s nodePort -> harbor pod

mdekoster commented 4 years ago

For me it seems this must be somehow related to nginx. Because our problems disappear when we bypass nginx ingress controller using "nodePort" option of kubernetes. It's not how it's meant to be, at least we can use our keycloak based sso again.

Our working setup: reverse proxy (haproxy) -> k8s nodePort -> harbor pod

Thanks for the suggestion, but we did not have this behaviour with the previous version (2.0.1) of Harbor.

timricese commented 4 years ago

Also have this problem after upgrading to 2.1.0 from 2.0.1, using Dex.

Any potential workaround?

mdekoster commented 4 years ago

When using the goharbor helm chart 1.5.0, this issue resolved.

ravens commented 4 years ago

We also had this very same problem on a cluster following an upgrade, it turned out that the generated redis config was incorrect (_REDIS_URL_CORE env variable was missing).

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ianseyer commented 4 hours ago

I am seeing this exact behavior.

My Redis Cluster is behaving normally, other cache/quota operations work and I see state tokens being written at the sid cookie id.

I am using redis+sentinel. I have also tried replacing redis with dragonflydb and got the same behavior.

I have wiped the database and redis and restarted fresh. My global secret key has not changed.

With debug logging enabled, I see:

[DEBUG] [/server/middleware/artifactinfo/artifact_info.go:55]: In artifact info middleware, url: /c/oidc/callback?error=invalid_client&error_description=Invalid%20client%20secret
[ERROR] [/core/controllers/oidc.go:91]: State mismatch, in session: %!s(<nil>), in url:
[ERROR] [/core/controllers/oidc.go:118]: Failed to exchange token, error: oauth2: cannot fetch token: 400 Bad Request

I am not sure why this would be the case, as: https://github.com/goharbor/harbor/blob/66c98c81f1196b52e7e143a61c40779e8cff6505/src/core/controllers/oidc.go#L89-L95

implies that I should, at a minimum, be seeing the state queryParam from the original URL of:

https://[...]/c/oidc/callback?code=TpY[...]SN&state=dUpI000[...]S8mJVra5o

rather than %!s(<nil>), in url: \n

If I take the sid cookie and run redis-cli GET <sid id>, I see the state queryParam value output:

redis@redis-0:/data$ redis-cli GET 7cde7a[...]6ba885
"\x10\xfe\x01\x11\x04\x01\x02\xfe\x01\x12\x00\x01\x10\x01\x10\x00\x00E\xfe\x01\x12\x00\x01\x06string\x0c\x0c\x00\noidc_state\x06string\x0c\"\x00 dUpI000[...]mS8mJVra5o"

What I'm wondering is: what is that encoding?

Additionally, I am seeing a successful dump to redis session:

State dumped to session: dUpI000yJ[...]S8mJVra5o

I have scaled all replicas to 1 to make sure there was no issue with sticky sessions.

Please advise, and please re-open @wy65701436

At a minimum, this logging is inaccurate as it indicates that the oidc_client_secret is invalid. This leads me to believe it is related to the secret being unable to decode that value; however, it still does not explain why the state mismatch debug log would show nil.

ianseyer commented 4 hours ago

https://github.com/goharbor/harbor-helm/issues/325#issuecomment-523711775

Users on this issue seem to indicate that having a custom secretKey at all can break OIDC.

I do set a custom secret key, to "". This is a requirement for us, as we cannot have the key be regenerated and risk nuking our installation.

I had encountered this exact error before, where I had changed the secret without realizing its importance, and had to wipe the DB and redis to start over (which is not acceptable for a production deployment). However this time, the secret value has not changed.

Are there other secrets that are not allowed to change? For example, we do not prevent core-secret, admin-password, registry-secret, or jobservice-secret from changing (with a restart).