Inconsistent Subject Identifier Properties across Standards (voPersonID, OASIS subject-id, OIDC/OAuth sub)

aarc-community / architecture-guidelines

3 stars 0 forks source link

Inconsistent Subject Identifier Properties across Standards (voPersonID, OASIS subject-id, OIDC/OAuth sub) #17

Open NicolasLiampotis opened 7 months ago

NicolasLiampotis commented 7 months ago

Description

This issue highlights inconsistencies in subject identifier properties (multiplicity, case sensitivity, syntax) between voPersonID, OASIS subject-id, and OIDC/OAuth sub.

Subject Identifier Comparison

Property	voPersonID	OASIS subject-id	OIDC sub
Multiplicity	Multi-valued	Single-valued	Single-valued
Case sensitivity	caseIgnore	caseIgnore	caseExact
Syntax	No syntax	127 ASCII for uid + `@` + 64 ASCII for scope	255 ASCII chars
Type	public	public (`subject-id`) & pairwise (`pairwise-id`) attribute	public and pairwise `sub` claim

We need to investigate if we can use an existing attribute/claim or if we need to define a new one. Using something standard like sub would be easy.

See also RANDE proposal for introducing gsub claim: https://docs.google.com/document/d/1XH3pX4zU62S7VQ3JGTLDgSr4tb9vt6sDW0sgxD2Xi64/edit

Related Issues

NicolasLiampotis commented 4 months ago

To change from voPersonId to sub has the following implications:

For proxy implementations:
- Community AAIs: Change the sub value to contain the same value the voPersonId
- Infra Proxies: Forward the subject identifier originating from the upstream proxy (e.g. Community AAI) via the sub claim
For relying parties:
- If they are already using the voPersonId as a unique key for the user they will need to change to the sub claim
- If they use the combination of sub + iss claim as the unique key for the user they will need to change to sub only
It will be necessary to introduce a version claim for signalling the supported AARC profile version to reying parties. At the same time, relying parties can sye the version scope to signal their support for the AARC profile

marcvs commented 4 months ago

AFAIR we introduced using the voPerson usage, because sub might not be specified by each OP implementation, plus it will lead to problems, since it's not unique. Imagine one infraproxy serving multiple communities. We don't want to (and can't) suggest scoping the sub. It is probably safer to assume that nobody uses multiple values in vopersonId

c00kiemon5ter commented 4 months ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP.

Its value (pairwise/public) depends on the subject_type attribute of the client/RP. subject_type is defined for sub. By using voperson_id it means that subject_type causes an indirect side-effect to the value (and the definition) of voperson_id.

Unless we agree that there will be no pairwise identifiers transmitted by voperson_id. But that would get us in a problematic situation for the pseudonymous and anonymous entity categories.

We have already seen problems with services not being able to understand other claims, and others that eventhough they do understand other claims as identifiers, they cannot be configured to extract part of them (the first element).

IMO using sub is the way to go. We do need to profile it. But it means that we are automatically compatible with all services that use sub as the identifier of the user.

msalle commented 3 months ago

I still see several issues with sub. If I understand correctly the plan is to convey the sub claim as issued by the Community AAI, not the home IdP.

That sub claim is then transparently passed on by infrastructure proxies to the end services. That either means 1) infra proxy uses the same iss claim as the community AAI (i.e. infra proxy is essentially the same as community AAI) or 2) the sub claim in itself is globally unique and could hence be issued by multiple issuers.

In case 1) the infra proxy is just passing all the information unchanged onwards, it cannot change anything in the token (since it's not the issuer) so logically there is no infrastructure proxy (but see below under verification too).

Concerning pairwise: in case 1) I can see how a pairwise could still be working (although it would be difficult to determine for the Community AAI which probably still sees all services as a single SP), in case 2) pairwise doesn't seem to make any sense: it's pairwise between community and infra, not beyond: whatever the infra proxy is passing on cannot be called a pairwise identifier. In short, I think pairwise makes little sense in any case.

Verification: case 1) implies that end services that want to do verification via introspection & well-known metadata endpoints (standard pattern) they will end up at the community AAI unless the infra proxy implements proxied token introspection. If they want to do offline verification, all end-services must trust all community AAIs directly. Since the latter again implies that there is not really a separate infra proxy, I think case 1) requires support for proxied token introspection.

Additionally, I'm still thinking that changing the content of the sub claim is going to cause a lot of problems:

the community AAIs might need to create different sub claims, incl. potentially in a different format (glob unique).
the infra proxies need to reuse an incoming claim as their sub claim
the relying parties might see new sub claims (which they could interpret as different users if they identified their users from the sub) For new communities/infrastructures these last points are less of an issue of course.

msalle commented 3 months ago

From the (just-finished) call: A problem with using sub: sub is considered unique in combination with the iss, I would say that that implies that getting the same sub from different iss, the RP needs to interpret them as different users? It's a bit unclear in https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.2 which says it can be globally unique versus OIDC core https://openid.net/specs/openid-connect-core-1_0.html#SubjectIDTypes where it says locally unique.

marcvs commented 3 months ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP. This is IMO the reason why we've introduced using vo_person_id. Didn't we?

NicolasLiampotis commented 3 months ago

sub is defined as "locally" unique; meaning that is unique within the network managed by the OP. This is IMO the reason why we've introduced using vo_person_id. Didn't we?

Note that the JWT RFC allows for either local or globally unique:

The subject value MUST either be scoped to be locally unique in the context of the issuer or be globally unique.

We also need to consider the syntax limitations to align between OIDC/OAuth sub and OASIS subject-id values. Even though both standards expect ASCII characters they have different requirements.

NicolasLiampotis commented 2 months ago

Current Proposals:

I've tried to summarise the four different approaches discussed during the last architecture working group meetings:

1. Stick to `voPersonID` (AARC-G026 & voPerson schema)

Pros:

Already implemented in several AAI deployments.

Cons:

voPersonID is defined as multi-valued in the voPerson schema, making it less compatible with single-valued standards like OAuth’s sub or OASIS’ subject-id.
No support for targeted identifiers → RPs need to resort to protocol-specific identifiers (e.g., sub (pairwise) in OIDC/OAuth or pairwise-id in SAML).

Migration Implications:

No changes for existing deployments which have already adopted AARC-G026.
But challenges with RPs that cannot be adjusted to support AARC-G026, expecting the protocol-default identifiers.

2. Move to `sub` (OIDC core) and `subject-id` (OASIS standard)

Pros:

Single-valued, compatible with both OAuth and SAML.
Supports both public and pairwise identifiers.
Aligns with REFEDS Personalized Access (for SAML).

Cons:

Changes required for systems currently relying on voPersonID / voperson_id.
Risk of identifier confusion due to combination of sub with iss.
AAIs acting as Infrastructure Proxies will need to support forwarding the sub value released from the upstream Identity Proxy/Community AAI.

Migration Implications:

RPs need to adjust to the new identifier format, which could break existing user identification methods (based on either voPersonID or sub+iss combination).

3. Express Subject Identifier in both `sub` and `voPersonID`

Pros:

Ensures compatibility with services expecting voPersonID while adopting standard sub.
No immediate migration necessary for RPs.

Cons:

Adds complexity for Infrastructure Proxies, which need to handle multiple identifiers.
AAIs acting as Infrastructure Proxies will need to support forwarding the sub value released from the upstream Identity Proxy/Community AAI.
Fragmented approach, leading to inconsistency across services.
Increases the burden on security teams tracking different subject identifier claims.

4. Use `voPersonID` and Fallback to `sub` for Limited RPs

Pros:

No impact on existing RPs relying on voPersonID.
Allows services that require sub to function correctly without major disruption.

Cons:

Fragmented approach, leading to inconsistency across services.

Migration Implications:

Requires RPs to selectively use sub or voPersonID, complicating support and security policies.
- Increases the burden on security teams tracking different subject identifier claims.
AAIs acting as Infrastructure Proxies will need to support forwarding the sub value released from their upstream Identity Proxy/Community AAI.

Feedback

Thoughts on these four approaches? Please share your feedback.

msalle commented 2 months ago

Comments on

for this one I don't really understand what the point of a globally unique and persistent but targeted identifier from a community AAI is? If the same user comes from the same community AAI to the same RP but via different infrastructure proxies, it would get different identifiers since the combination RP<>infraproxy is different.
do we mean putting a globally scoped value in just sub? But then, when that sub comes via different infraproxies, the RPs will see different iss meaning they must interpret them as different users, or break the standards. Alternatively the infraproxies would also reuse the same iss but that means they either need the signing keys or they cannot adjust any of the claims in the tokens. Plus, not all software will be able to put "random" strings inside the sub claim.
see under 2 for issues with sub.
I still think only services that are directly coupled to a single community AAI would be the "dummy" services that cannot process voPersonId and are essentially out of scope of AARC. In any case, since they essentially hang directly behind the community AAI, that community AAI will know what they need to put in, plus making a claim that is unique within their own community for a single iss is much easier since it's the standard OAuth2/OIDC scenario. Plus why would we here need to put the same in voPersonId and sub ? If we include pairwise, different RPs in any case will get different identifiers.

apw1388 commented 2 months ago

I think everything beside 2 would be fine, since we do not "break" previous guidelines. I also assume that we have always the problem of users appearing in two different accounts by using different proxies in the chain, because it is always possible that anyone in the chain releases pairwise identifiers.

marcvs commented 2 months ago

Have we ever thought about connecting CAAI and IP via SAML. In this case there wouldn't be any sub claim available.

NicolasLiampotis commented 1 month ago

The different approaches for expressing subject identifiers are depicted below:

AARC-G056 - Subject Identifiers drawio

Key Points identified so far:

Approaches 1, 3, and 4: These approaches require profiling of voPerson to mandate single-valued voPersonID attributes in SAML and the voperson_id claim in OIDC (see also #8).
Requirements on Proxies:
- Approaches 2, 3, and 4 require proxies to convey a globally unique representation of the subject through the sub claim, rather than using a locally unique identifier to the issuer. There was consensus that proxies ~do not~ need to modify the issuer (iss) value, even when forwarding the subject identifier from their upstream proxy.
- Some approaches require proxies need to express subject identifiers differently depending on whether they’re interfacing with other proxies or with end-services. The diagram will be updated to reflect this distinction for each of the four approaches.
Requirements on OIDC End-Services: All approaches assume that OIDC-based end-services can identify the user through a single claim, either voperson_id or sub. In doing so, they would need to override the sub + iss combination defined in the OpenID Core specification.

msalle commented 1 month ago

So that we don't forget, from our last meeting: I think we might want/need to make a difference between the community AAI and the infrastructure proxy (whether both are separately present is dependent on the details of the communities/infrastructures).

for the Community AAI -> Infra Proxy: I think that would be much better in an VOPersonID or other non-sub claim:
- infraproxy typically talks to multiple communities
- community AAIs changing existing sub claims might be a nightmare
for the InfraProxy -> relying services: that is essentially an infrastructure internal decision and could be put in the sub when the Infrastructure knows that that is necessary.

I think these two can exist perfectly fine in one AARC BPA?

NicolasLiampotis commented 1 month ago

Here is an updated diagram that visualises the attribtue/claim release requirements for Identity Proxies/Community AAIs/Infra Proxies depending on the type of relying party they need to interact with.

AARC-G056 - Subject Identifiers v2

Base on this, AARC-compliant proxies need to be flexible, and I don't see how they can avoid modifying sub claims depending on the type of relying party they interact with. Proxies must adapt to various scenarios, including:

End-services: Some services may have limited capabilities to customize protocol-specific subject identifiers. In such cases, the proxy might need to provide a sub claim that the service can handle easily.
End-services with Migration Needs: For services transitioning from legacy systems or older subject identifiers, proxies will need to manage the subject identifier carefully to ensure compatibility and smooth migration, potentially altering the sub claim to meet service-specific requirements.
AARC-Compliant Proxies: When interacting with another AARC-compliant proxy, the subject identifier strategy may need to differ, as these proxies might require specific claims or identifiers to maintain federation consistency.

msalle commented 1 month ago

In addition to your point bullet, two points: there might also be services that want yet another attribute (such as email). So in any case, the proxy connecting to that service (whether it is a community AAI, Infra proxy or ...) will need flexibility.

Fortunately the combination sub/iss should not be an issue for these situations since they are connected to a single proxy.

NicolasLiampotis commented 4 weeks ago

To ensure flexibility and minimise migration efforts for deployments that have adopted either AARC-G026 or protocol-specific identifiers, we propose Approach 3 as shown below:

AARC-G056 - Subject Identifiers v3

The following notes have been added to the attribute specification in AARC-G056:

An AARC-compliant AAI SHOULD release the Subject Identifier through the subject-id SAML attribute only if the subject’s identity provider, which is authoritative for the Subject Identifier scope, has released the subject-id value within the same authentication session. This ensures that the Subject Identifier scope has been validated, for example by checking attribute values against the Issuer's published shibmd:Scope elements in SAML metadata.
SAML Relying Parties receiving the Subject Identifier from AARC-compliant AAIs through the subject-id SAML attribute are not required to check the attribute value against the Issuer's published shibmd:Scope elements in SAML metadata, and may accept the value, regardless of whether it matches the shibmd:Scope elements.
The voPersonID attribute in the [voPerson-v2] schema is defined as a multi-valued attribute. However, an AARC BPA-compliant AAI MUST release a single value in both the voPersonID attribute and the voperson_id claim. The voperson_id claim can be represented either as a single string or as an array containing a single value. This approach aligns with flexibility seen in other claims, such as the aud claim in [RFC7519], and allows for compatibility across different implementations.

aarc-community / architecture-guidelines