aarc-community / architecture-guidelines

2 stars 0 forks source link

Inconsistent Subject Identifier Properties across Standards (voPersonID, OASIS subject-id, OIDC/OAuth sub) #17

Open NicolasLiampotis opened 5 months ago

NicolasLiampotis commented 5 months ago

Description

This issue highlights inconsistencies in subject identifier properties (multiplicity, case sensitivity, syntax) between voPersonID, OASIS subject-id, and OIDC/OAuth sub.

Subject Identifier Comparison

Property voPersonID OASIS subject-id OIDC sub
Multiplicity Multi-valued Single-valued Single-valued
Case sensitivity caseIgnore caseIgnore caseExact
Syntax No syntax 127 ASCII for uid + @ + 64 ASCII for scope 255 ASCII chars
Type public public (subject-id) & pairwise (pairwise-id) attribute public and pairwise sub claim

We need to investigate if we can use an existing attribute/claim or if we need to define a new one. Using something standard like sub would be easy.

See also RANDE proposal for introducing gsub claim: https://docs.google.com/document/d/1XH3pX4zU62S7VQ3JGTLDgSr4tb9vt6sDW0sgxD2Xi64/edit

Related Issues

NicolasLiampotis commented 2 months ago

To change from voPersonId to sub has the following implications:

marcvs commented 2 months ago

AFAIR we introduced using the voPerson usage, because sub might not be specified by each OP implementation, plus it will lead to problems, since it's not unique. Imagine one infraproxy serving multiple communities. We don't want to (and can't) suggest scoping the sub. It is probably safer to assume that nobody uses multiple values in vopersonId

c00kiemon5ter commented 2 months ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP.

Its value (pairwise/public) depends on the subject_type attribute of the client/RP. subject_type is defined for sub. By using voperson_id it means that subject_type causes an indirect side-effect to the value (and the definition) of voperson_id.

Unless we agree that there will be no pairwise identifiers transmitted by voperson_id. But that would get us in a problematic situation for the pseudonymous and anonymous entity categories.

We have already seen problems with services not being able to understand other claims, and others that eventhough they do understand other claims as identifiers, they cannot be configured to extract part of them (the first element).

IMO using sub is the way to go. We do need to profile it. But it means that we are automatically compatible with all services that use sub as the identifier of the user.

msalle commented 1 month ago

I still see several issues with sub. If I understand correctly the plan is to convey the sub claim as issued by the Community AAI, not the home IdP.

That sub claim is then transparently passed on by infrastructure proxies to the end services. That either means 1) infra proxy uses the same iss claim as the community AAI (i.e. infra proxy is essentially the same as community AAI) or 2) the sub claim in itself is globally unique and could hence be issued by multiple issuers.

In case 1) the infra proxy is just passing all the information unchanged onwards, it cannot change anything in the token (since it's not the issuer) so logically there is no infrastructure proxy (but see below under verification too).

Concerning pairwise: in case 1) I can see how a pairwise could still be working (although it would be difficult to determine for the Community AAI which probably still sees all services as a single SP), in case 2) pairwise doesn't seem to make any sense: it's pairwise between community and infra, not beyond: whatever the infra proxy is passing on cannot be called a pairwise identifier. In short, I think pairwise makes little sense in any case.

Verification: case 1) implies that end services that want to do verification via introspection & well-known metadata endpoints (standard pattern) they will end up at the community AAI unless the infra proxy implements proxied token introspection. If they want to do offline verification, all end-services must trust all community AAIs directly. Since the latter again implies that there is not really a separate infra proxy, I think case 1) requires support for proxied token introspection.

Additionally, I'm still thinking that changing the content of the sub claim is going to cause a lot of problems:

msalle commented 1 month ago

From the (just-finished) call: A problem with using sub: sub is considered unique in combination with the iss, I would say that that implies that getting the same sub from different iss, the RP needs to interpret them as different users? It's a bit unclear in https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.2 which says it can be globally unique versus OIDC core https://openid.net/specs/openid-connect-core-1_0.html#SubjectIDTypes where it says locally unique.

marcvs commented 1 month ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP. This is IMO the reason why we've introduced using vo_person_id. Didn't we?

NicolasLiampotis commented 1 month ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP. This is IMO the reason why we've introduced using vo_person_id. Didn't we?

Note that the JWT RFC allows for either local or globally unique:

The subject value MUST either be scoped to be locally unique in the context of the issuer or be globally unique.

We also need to consider the syntax limitations to align between OIDC/OAuth sub and OASIS subject-id values. Even though both standards expect ASCII characters they have different requirements.

NicolasLiampotis commented 3 weeks ago

Current Proposals:

I've tried to summarise the four different approaches discussed during the last architecture working group meetings:

1. Stick to voPersonID (AARC-G026 & voPerson schema)

Pros:

Cons:

Migration Implications:


2. Move to sub (OIDC core) and subject-id (OASIS standard)

Pros:

Cons:

Migration Implications:


3. Express Subject Identifier in both sub and voPersonID

Pros:

Cons:


4. Use voPersonID and Fallback to sub for Limited RPs

Pros:

Cons:

Migration Implications:

Feedback

Thoughts on these four approaches? Please share your feedback.

msalle commented 3 weeks ago

Comments on

  1. for this one I don't really understand what the point of a globally unique and persistent but targeted identifier from a community AAI is? If the same user comes from the same community AAI to the same RP but via different infrastructure proxies, it would get different identifiers since the combination RP<>infraproxy is different.
  2. do we mean putting a globally scoped value in just sub? But then, when that sub comes via different infraproxies, the RPs will see different iss meaning they must interpret them as different users, or break the standards. Alternatively the infraproxies would also reuse the same iss but that means they either need the signing keys or they cannot adjust any of the claims in the tokens. Plus, not all software will be able to put "random" strings inside the sub claim.
  3. see under 2 for issues with sub.
  4. I still think only services that are directly coupled to a single community AAI would be the "dummy" services that cannot process voPersonId and are essentially out of scope of AARC. In any case, since they essentially hang directly behind the community AAI, that community AAI will know what they need to put in, plus making a claim that is unique within their own community for a single iss is much easier since it's the standard OAuth2/OIDC scenario. Plus why would we here need to put the same in voPersonId and sub ? If we include pairwise, different RPs in any case will get different identifiers.
apw1388 commented 5 days ago

I think everything beside 2 would be fine, since we do not "break" previous guidelines. I also assume that we have always the problem of users appearing in two different accounts by using different proxies in the chain, because it is always possible that anyone in the chain releases pairwise identifiers.

marcvs commented 1 day ago

Have we ever thought about connecting CAAI and IP via SAML. In this case there wouldn't be any sub claim available.