WLCG-AuthZ-WG / common-jwt-profile

A repo for the WLCG Common JWT profile document
3 stars 8 forks source link

Submitting pilots to computing resources: `wlcg.groups` claim and `client_credentials` flow #24

Open aldbr opened 1 year ago

aldbr commented 1 year ago

Submitting pilots to different sites does not require any interaction with any user: it is basically an access request from a service (a pilot factory) to another external service. Therefore, the client_credentials flow seems to perfectly fit with this use case. The main advantage of the client access token over the user access token, in this context, is the fact that it can be generated without any user interaction: an expired/lost user refresh token would force operators of the pilot factory to manually generate a new pair of tokens by interacting with an OIDC provider (e.g. IAM).

WLCG sites may accept/reject pilots according to scope-based and/or group-based capabilities (source: https://github.com/nordugrid/arctestsite-hackathon-slurm-el7-arc7/blob/main/arc.conf#LL19-L26C132).

Thus, in the current WLCG profile, the client_credentials flow is not considered a "valid" way of getting an access token to submit pilots - since it would not be accepted by every site -, is it? Is there any reason for that?

What if we could associate a client with a group (by simply adding wlcg.groups=/<group>/<subgroup> in the default scope of a given client)? Currently, IAM and the WLCG profile do not seem to support such a use case.

Thanks in advance

maarten-litmaath commented 1 year ago

Hi, thanks for the elaborate description of that matter. I do not know of a good reason to prevent a client_credentials client from having wlcg.groups defined. After all, it is just an alternative way for expressing what powers to bestow upon clients: if they can obtain tokens with scopes, then why not allow groups?

msalle commented 1 year ago

I also think there is no real reason for requiring it to be an end-user. I think your suggestions make perfect sense.

norealroots commented 1 year ago

I think the issue here is that, in the existing IAM/OAuth logic, it is not possible for a client to belong to a group - this is a user attribute, as you've identified. Giving the groups claim to a client would not return any information, as it's not possible for the client to belong to a group itself.

Indeed, to my understanding should the client credentials flow be used with OIDC, no ID token is returned as there is no associated identity within the flow. As group membership is a attribute of a user's identity, it will not be present if there is no user. Within OIDC, the Client Credentials flow would typically be used for machine to machine communication, such as to allow a client to authenticate itself for API access.

Because of this, including some specific group info as a client scope wouldn't work without extra legwork, as the scopes are not where the service would be looking for group authorisation information - it would look for an ID token, find nothing, and then return unauthorised.

Thus, in the current WLCG profile, the client_credentials flow is not considered a "valid" way of getting an access token to submit pilots - since it would not be accepted by every site -, is it? Is there any reason for that?

Ultimately, this boils down to the fact that the Client Credential flow, as it's defined in specifications, would not work for any site utilizing solely group-based authorisation, as this requires a user present about whom to make authorisation decisions.

I do however think this means it is fine to use the client credential flow when utilizing scope-based authorisation (assuming your client has a sufficiently secure way of keeping its ClientID/Secret safe), as here the required scopes can be defined as part of the token request.

What if we could associate a client with a group (by simply adding wlcg.groups=// in the default scope of a given client)?

There may be a way to associate a client with a group, but - for the reasons described above - I don't think this would be something to be described as a "simple" task, and would likely need development effort to make it possible for a client to belong to a group.

Personally, I think we would be best off making sure that services have methods and logic in place to handle both groups and scopes, so that the client credential flow can be used with scopes for authorisatio and groups used for user-related decisions.

atsareg commented 1 year ago

As far as I can see there is a common agreement that client tokens can be used by pilot factories. The question is then whether sites will be ready to accept those tokens or not. If sites accepting jobs by scopes are already OK - should it be then made a general mandatory requirement for the WLCG sites by GDB, for example ? On the other hand sites should have some identity information as well, at least the VO from which the pilots are coming for accounting, blacklisting or whatever. So, there should be a way to provide this information to sites in the tokens

msalle commented 1 year ago

I think the issue here is that, in the existing IAM/OAuth logic, it is not possible for a client to belong to a group - this is a user attribute, as you've identified. Giving the groups claim to a client would not return any information, as it's not possible for the client to belong to a group itself.

The argument that it takes development effort and therefore is not a good idea sounds live a valid reason not to implement it. I'm not sure about the rest of your argument though, see below.

Indeed, to my understanding should the client credentials flow be used with OIDC, no ID token is returned as there is no associated identity within the flow. As group membership is a attribute of a user's identity, it will not be present if there is no user. Within OIDC, the Client Credentials flow would typically be used for machine to machine communication, such as to allow a client to authenticate itself for API access.

But the OAuth2 part just authorizes the client. The client still has an identity, even if that's a robot identity. I think there is nothing wrong with returning an id_token also for the client_credentials grant.

Because of this, including some specific group info as a client scope wouldn't work without extra legwork, as the scopes are not where the service would be looking for group authorisation information - it would look for an ID token, find nothing, and then return unauthorised.

I'm a bit confused about this. We say in https://github.com/WLCG-AuthZ-WG/common-jwt-profile/blob/master/profile.md#common-claims that wlcg.groups when used MUST be present in both tokens. Otherwise resources services (that do not get the id_token) would not have access to it.

Thus, in the current WLCG profile, the client_credentials flow is not considered a "valid" way of getting an access token to submit pilots - since it would not be accepted by every site -, is it? Is there any reason for that?

Ultimately, this boils down to the fact that the Client Credential flow, as it's defined in specifications, would not work for any site utilizing solely group-based authorisation, as this requires a user present about whom to make authorisation decisions.

see above, or a robot identity about whom to make decisions.

I do however think this means it is fine to use the client credential flow when utilizing scope-based authorisation (assuming your client has a sufficiently secure way of keeping its ClientID/Secret safe), as here the required scopes can be defined as part of the token request.

What if we could associate a client with a group (by simply adding wlcg.groups=// in the default scope of a given client)?

There may be a way to associate a client with a group, but - for the reasons described above - I don't think this would be something to be described as a "simple" task, and would likely need development effort to make it possible for a client to belong to a group.

Personally, I think we would be best off making sure that services have methods and logic in place to handle both groups and scopes, so that the client credential flow can be used with scopes for authorisatio and groups used for user-related decisions.

I agree it's probably the easier solution, but just because it's easier with the current implementation.

DrDaveD commented 1 year ago

I note that supporting the client_credentials flow, although simple, has implications on operations responsibilities and on security. I don't see much of a problem with it for a small set of robots such as for pilots, but I think it may become a management problem if expected to be used for every robot.

For operations responsibilities, it requires that the people who manage the token issuer get involved in associating client credentials with completely specified access token contents. On the other hand, the way that we do it at Fermilab with the CILogon token issuer, different people maintain a database showing which group of individuals are authorized to get a token for a specific robot. Then the shared oauth client can be used, without needing to create another set of client credentials for every robot.

For security, client credentials have zero additional security protection beyond the minimum of login & password, they are very basic. Since we have Hashicorp Vault in the architecture, we can do more sophisticated protection. The long-lived unprotected credentials we use are kerberos keytabs which we already have procedures in place to create and manage and which can be tracked to individuals and hosts and can be easily revoked.

norealroots commented 1 year ago

@msalle:

But the OAuth2 part just authorizes the client. The client still has an identity, even if that's a robot identity. I think there is nothing wrong with returning an id_token also for the client_credentials grant.

To my understanding of the protocols, I don't think it does? It has an "identity" in that it is registered and known to the OIDC Provider, but it does not have an "identity" in the sense it has user attributes? The issue with returning an id_token, is that there is no id to return.

We could register service accounts to represent clients, but that's another credential set for your client to manage and doesn't strike me as a good solution to follow down.

I'm a bit confused about this. We say in https://github.com/WLCG-AuthZ-WG/common-jwt-profile/blob/master/profile.md#common-claims that wlcg.groups when used MUST be present in both tokens. Otherwise resources services (that do not get the id_token) would not have access to it.

Apologies, I think here is a difference in me quoting the spec vs our profile. But, either way - the groups claim for a client credential flow would not return values, without involving a user (which could be a service account). But then, with a service account, you wouldn't be doing a client credential flow anyway as that authorization grant does not involve and form of user account.

I think what you're saying, with the use of service/robot accounts, would be possible. However, that's - to my understanding - no longer the as-spec client credential flow, as you have a "user" involved.

Happy to be proved wrong on this! Just sharing my thoughts based on how I understand the OAuth and OIDC protocols work - I may be overlooking something!

@DrDaveD

Indeed - the added security concerns of putting more weight on the client's access was why I mentioned:

(assuming your client has a sufficiently secure way of keeping its ClientID/Secret safe)

In my initial response.

msalle commented 1 year ago

@msalle:

But the OAuth2 part just authorizes the client. The client still has an identity, even if that's a robot identity. I think there is nothing wrong with returning an id_token also for the client_credentials grant.

To my understanding of the protocols, I don't think it does? It has an "identity" in that it is registered and known to the OIDC Provider, but it does not have an "identity" in the sense it has user attributes? The issue with returning an id_token, is that there is no id to return.

I think that essentially the OIDC spec has the same assumption as we originally had in our profile, namely that it's always limited to human end users, see abstract and terminology in the OIDC core spec. But a service account also has an identity and can authenticate, not just be authorized, so it's not clear how you could express that using OIDC? Also, the OIDC core spec doesn't mention the client_credentials grant since it's assumed to be just OAuth2 i.e. authorization only. So yes if we would stick to the pure specs, I think we couldn't do any OIDC in combination with client_credentials grants but also not really do service accounts themselves (with any flow) ?

We could register service accounts to represent clients, but that's another credential set for your client to manage and doesn't strike me as a good solution to follow down.

I think the idea is to have a fixed set of claims returned for a client, so without the need of an extra set of credentials, maybe I don't fully follow what you mean...

I'm a bit confused about this. We say in https://github.com/WLCG-AuthZ-WG/common-jwt-profile/blob/master/profile.md#common-claims that wlcg.groups when used MUST be present in both tokens. Otherwise resources services (that do not get the id_token) would not have access to it.

Apologies, I think here is a difference in me quoting the spec vs our profile. But, either way - the groups claim for a client credential flow would not return values, without involving a user (which could be a service account). But then, with a service account, you wouldn't be doing a client credential flow anyway as that authorization grant does not involve and form of user account.

I think what you're saying, with the use of service/robot accounts, would be possible. However, that's - to my understanding - no longer the as-spec client credential flow, as you have a "user" involved.

Not sure we're talking about the exact same thing, but my point is that in doing an client_credential flow, there is still an access token returned to the client, that contains a set of claims. I think that could include a wlcg.groups claim just as it can contain other claims (such as capability scopes). Whether this is a practical solution is a completely different question and from looking at the discussions I don't think it it, but this is more about whether there is a good reason not to allow it in the profile.

Happy to be proved wrong on this! Just sharing my thoughts based on how I understand the OAuth and OIDC protocols work - I may be overlooking something!

Same here, I'm just sharing my thoughts and trying to figure out whether we're forbidding something that is actually quite acceptable.

norealroots commented 1 year ago

But a service account also has an identity and can authenticate, not just be authorized, so it's not clear how you could express that using OIDC? Also, the OIDC core spec doesn't mention the client_credentials grant since it's assumed to be just OAuth2 i.e. authorization only. So yes if we would stick to the pure specs, I think we couldn't do any OIDC in combination with client_credentials grants but also not really do service accounts themselves (with any flow) ?

This is where I think the "extra credentials" I was saying would be needed, if we were to use groups "properly" - you would need to register a user within the IAM to represent the service, to which you can assign group membership. Then you would use standard flows, such as authorization_code, with service account credentials and that account linking to a group.

Checking IAM, you can indeed get the AuthZ (groups) information through in the access token, but this would again require either:

Not sure we're talking about the exact same thing, but my point is that in doing an client_credential flow, there is still an access token returned to the client, that contains a set of claims. I think that could include a wlcg.groups claim just as it can contain other claims (such as capability scopes). Whether this is a practical solution is a completely different question and from looking at the discussions I don't think it it, but this is more about whether there is a good reason not to allow it in the profile.

And this comes back to my point - you could indeed probably directly request the groups claim, but the groups claim doesn't mean anything in a flow where there is no user present, such as the client_credential one. Instead of the client a

norealroots commented 1 year ago

I do believe I have muddied things somewhat by commenting on whether a client could actually be a group member, rather than sending a groups claim through as a specific scope. Taking this right back to the original question:

What if we could associate a client with a group (by simply adding wlcg.groups=// in the default scope of a given client)? Currently, IAM and the WLCG profile do not seem to support such a use case.

I believe that, yes, we could write some scopes pertaining to specific groups that a client could then request. However, at that point, are we not better following our initial defined scopes schema for capabilities? And then ensure that endpoints know how to process a token containing either capability scopes or user groups? Because, to me, a scope of wlcg.groups=/<group>/<subgroup> would provide the same permissions as an equivalent scope with the existing schema.

I think allowing a robot flow a generic group authz is a can of worms, compared to restricting it to a defined set of capabilities. If we were to allow this, I believe we would need to spend some time updating the profile in some notable ways:

msalle commented 1 year ago

I think the reason was that some sites might not want to/cannot do capability based access control but instead group-based. Hence instead of hard-coding the access token to return a capability scope claim to (also) return a wlcg.group claim, would solve that.

Concerning no sub in client credential flow, that's not entirely obvious. There isn't a clear spec for that, since JWT access tokens are only defined in rfc7523 which doesn't talk about the client credential flow. That RFC does talk about having sub claims in the tokens used in client authentication (which is not the same as the returned access token we're discussing), see section-3 under 2.B but shows that subs can be other things than humans.

norealroots commented 1 year ago

I'm still not sold on the hard-coding a wlcg.group claim, it feels like a hack rather than a real solution - we are butchering the existing concept of a group, and my reluctance definitely comes from our schema defining groups usage around an end-user. A client will never actually "belong" to the group - not without some form of service account, as I previously mentioned, because as per our schema:

The wlcg.groups claim conveys group membership about an authenticated end-user.

Hard coding a group into a client's scopes doesn't to me imply that it has followed the approval process other users would have done, and would not be subject to group management in the same way.

I think the reason was that some sites might not want to/cannot do capability based access control but instead group-based.

Tbh, I feel like the answer should be that if robot access is required, the site needs to do capability based control, as well as groups.

Allowing a client to assert it's own group membership as a scope does not sit right with me.

WRT sub/client_credential, I believe you are right here. Sub in the CC flow would be the ClientID (though that won't have any group information attached to it with the current IAM setup, which was my initial point I was trying to make 😅)

aldbr commented 1 year ago

I can see that this issue has generated a lot of discussions. Let's try to summarize what has been said so far.

The original problem was: the client_credentials flow could be technically used to submit pilots to different sites, but how can we make sure that all the sites will accept a client access token?

Two main responses emerged from the discussions:

  1. Attach a group-based capability to a client.
    • client access tokens would be able to embed both group and scope-based capabilities, and sites could configure their authZ logic the way they want.
    • requires some non-trivial changes in the token profile definition and OIDC IdPs implementations.
  2. "Force" sites to accept scope-based capabilities (compute.read, compute.write).
    • the token profile remains correct, OIDC IdPs supporting it do not need to implement further logic.
    • but, how do we make sure that the sites will respect this implicit rule?

From what I can read and understand, I feel like the idea of simply integrating the wlcg.groups claim to the group is a kind of hack as this notion of "Claims" is bound to OIDC, and that they are used "to communicate information about the End-User" according to the OIDC documentation. I would rather go with the second solution if we can "politically" enforce the support for scope-based capabilities.

msalle commented 1 year ago

I think it's probably fine to go for solution two, but would like to point out that no matter what we do, also for robots and other non-human participants we'll probably ultimately need claims that originate from OIDC core. For example to have security or other contact information. The point is that OIDC is the authentication layer built on top of OAuth2 which just does authorization. That makes it all a bit messy and we'll need to use certain parts in areas that were not originally envisioned by the OIDC designers.

norealroots commented 1 year ago

but would like to point out that no matter what we do, also for robots and other non-human participants we'll probably ultimately need claims that originate from OIDC core.

Agreed, and we should adjust and add these to the schema as the use cases arrive :)

My main issue with 1) was, with the most straight forward setup, we would effectively just be disguising a scope as the groups claim, rather than doing things "properly".

msalle commented 1 year ago

I also agree with that actually. One more point though: I think we should make a distinction between what we allow in the profile and what we want sites and IAM to do at this point in time. Forbidding something in the profile because it is not currently technically the easiest is mixing spec and implementation too much in my opinion and might bite us in the future. And I still think that the wording about a person is too restrictive in the profile, even when we don't actually need a change for the currently easiest implementation.

maarten-litmaath commented 1 year ago

Hi all, is there any technical obstacle for using a client_credentials client with scope-based capabilities today?

norealroots commented 1 year ago

Hi all, is there any technical obstacle for using a client_credentials client with scope-based capabilities today?

To my understanding there is no limitations to using the client credentials flow with IAM and with the scope-based format we've defined in the token profile.

maarten-litmaath commented 1 year ago

Can DIRAC just go ahead then?

atsareg commented 1 year ago

  We can try this out. I think Alexandre tried this out already. I will see with him.

  Andrei

On 21/03/2023 13:48, Maarten Litmaath wrote:

Can DIRAC just go ahead then?

— Reply to this email directly, view it on GitHub https://github.com/WLCG-AuthZ-WG/common-jwt-profile/issues/24#issuecomment-1477780729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFF7VG65WKDUV6KKMN46Q3W5GPTFANCNFSM6AAAAAAVT2AVRU. You are receiving this because you commented.Message ID: @.***>

aldbr commented 1 year ago

is there any technical obstacle for using a client_credentials client with scope-based capabilities today?

Technically, there is no issue, we can get a client access token embedding scope-based capabilities (compute.*) to submit pilots to sites. It is mainly a "political" matter: sites should accept client access tokens with scope-based capabilities (right now, nothing prevents them from relying only on group-based capabilities).

As @msalle said, "we should make a distinction between what we allow in the profile and what we want sites and IAM to do at this point in time". I think we should make sure that they will accept such tokens for the moment. Then, if sites need further details to accept client access tokens in the future, as @norealroots said, "we should adjust and add these to the schema as the use cases arrive".

Can DIRAC just go ahead then?

We have already made some tests with a few sites and they accepted our client access token (likely because they support both group and scope-based capabilities) but we were wondering whether every site would accept a client access token. This is why I opened this issue.

maarten-litmaath commented 1 year ago

For the time being we rely just on the compute scopes and the token issuer + subject decide the mapping, while we might want to take advantage of wlcg.groups in the future.