dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
277 stars 132 forks source link

`oidc` plugin - authentication against the OP #7553

Closed calestyo closed 3 weeks ago

calestyo commented 2 months ago

Hey.

Took me quite some time today, to find out why oidc authentication failed.

In a line like:

gplazma.oidc.provider!atlas = https://atlas-auth.web.cern.ch/ …

there needs to be some way how to actually authenticate against the OP.

What people in dCache would probably expect is that /etc/grid-security/ CAs are used for that, which would IMO be the natural and best choice.
However, undocumented as it is (or at least I couldn’t find anything about it), it uses the system JKS instead.

And the error message is pretty much encrypted:

Unknown kid "rsa2"

And even with DEBUG it doesn’t get much better:

2024-04-18T19:46:48.418110+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]: 2024-04-18 19:46:48+02:00 (gPlazma) [webdav.tls_lcg-lrz-dc35 Login AUTH oidc] Failed to fetch discovery document for atlas: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2024-04-18T19:46:48.435721+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]: 2024-04-18 19:46:48+02:00 (gPlazma) [webdav.tls_lcg-lrz-dc35 Login] Login attempt failed; detailed explanation follows:
2024-04-18T19:46:48.435958+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]: LOGIN FAIL
2024-04-18T19:46:48.436091+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |    in: Origin[2003:cd:df19:3800:8f7e:27cc:ac69:49fc]
2024-04-18T19:46:48.436262+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |        JWT bearer token:
2024-04-18T19:46:48.436449+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          |
2024-04-18T19:46:48.436617+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- iss: "https://atlas-auth.web.cern.ch/"
2024-04-18T19:46:48.436760+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- jti: "19e67d0c-f4ce-45cf-81e5-6d8460ff710d"
2024-04-18T19:46:48.436926+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- sub: "b41bd224-951e-47b9-8f86-c234e491d8b4"
2024-04-18T19:46:48.437036+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- scope: "storage.read:/atlasdatadisk/SAM/ storage.create:/atlasdatadisk/SAM/ storage.modify:/atlasdatadisk/SAM/"
2024-04-18T19:46:48.437208+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- iat: 1713461104 --> 2024-04-18 19:25:04.000 (21 min ago)
2024-04-18T19:46:48.437377+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- nbf: 1713461104 --> 2024-04-18 19:25:04.000 (21 min ago)
2024-04-18T19:46:48.437537+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- exp: 1713464704 --> 2024-04-18 20:25:04.000 (38 min in the future)
2024-04-18T19:46:48.437696+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- aud: "lcg-lrz-http.grid.lrz.de"
2024-04-18T19:46:48.437857+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |          +- client_id: "5710f419-1bd2-4b1b-afd2-954f7b1f0005"
2024-04-18T19:46:48.438000+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.438162+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.438290+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  +--AUTH OK
2024-04-18T19:46:48.438451+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.438616+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--x509 OPTIONAL:FAIL (no X.509 certificate chain) => OK
2024-04-18T19:46:48.438736+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.438876+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--voms OPTIONAL:FAIL (no X509 certificate chain) => OK
2024-04-18T19:46:48.438997+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.439137+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--oidc OPTIONAL:FAIL (Unknown kid "rsa2") => OK
2024-04-18T19:46:48.439287+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.439484+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  +--MAP FAIL
2024-04-18T19:46:48.439606+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.439749+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--vorolemap OPTIONAL:FAIL (no record) => OK
2024-04-18T19:46:48.439880+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.440022+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--gridmap OPTIONAL:FAIL (no mapping) => OK
2024-04-18T19:46:48.440173+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   |
2024-04-18T19:46:48.440310+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |   +--authzdb REQUISITE:FAIL (no mappable principal) => FAIL (ends the phase)
2024-04-18T19:46:48.440449+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.440585+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  +--(ACCOUNT) skipped
2024-04-18T19:46:48.440720+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.440881+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  +--(SESSION) skipped
2024-04-18T19:46:48.441013+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  |
2024-04-18T19:46:48.441152+02:00 lcg-lrz-dc50 dcache@gplazma1[742038]:  +--(VALIDATION) skipped

For security reasons I generally have no certs enabled on our systems.
Knowing that, I had of course already a vague feeling and did actually enable the certs used by ATLAS’ OP, but then I got bitten by https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069251 so until I realised that Debian also does wrong, I thought it couldn’t be the certs.

So from order from the least favourable to the best, I'd say the following would be good to have:

Cheers, Chris.

paulmillar commented 2 months ago

The documentation has been update on master, with pull-request targeting 9.2: https://github.com/dCache/dcache/pull/7554

There is a patch that updates the oidc plugin so it provides more "reasonable" error message if there was a problem fetching or parsing either the discovery document or the JWKS document:: https://rb.dcache.org/r/14248/

Having a configurable trust-store is something that makes sense when OPs start using IGTF certificates. So far, OPs (include atlas-auth.web.cern.ch) use CA/B certificates, which is available (by default) on most Java distributions.

calestyo commented 2 months ago

The documentation has been update on master, with pull-request targeting 9.2: #7554

looks good :-)

Having a configurable trust-store is something that makes sense when OPs start using IGTF certificates. So far, OPs (include atlas-auth.web.cern.ch) use CA/B certificates, which is available (by default) on most Java distributions.

Hmm I personally would have argued the other way round:

With the IGTF bundle we have a strictly limited set of CAs from "our community" (science), which apart from perhaps one or two we can probably trust.
With CA/B - assuming that most people just leave them all enabled – we have ~150 root CAs, probably some thousands of intermediate CAs (which also can do more or less anything), not few of them which have been known in the past for repeatedly ~forging certificates~ "accidentally" release test certificates for completely unrelated domains ... and which are from and thus effectively under the control of totalitarian countries.
That's not only the reason why I disable them all ^^ per default... and stumbled over this in the first place, but also why I think it would make some sense to allow at least specifically configuring the CA (which I guess in most cases will be USERTrust Network via Géant).

Cheers, Chris :-)