KantaraInitiative / wg-uma

This is the repository of all specifications related to the User Managed Access Group
http://kantarainitiative.org/confluence/display/uma/
Other
28 stars 21 forks source link

Consider the privacy implications of exposing an RO’s AS #107

Closed xmlgrrl closed 7 years ago

xmlgrrl commented 10 years ago

When a (100% untrusted) client first approaches a resource, if it's UMA-protected, the first thing an RS will do is return the as_uri where the client can engage in flows to try and gain access. Does revealing the AS location compromise privacy in any way? For example, if the RO (let's assume it's a human) runs their own AS, this information could uniquely identify that person. Is that a problem? Is it simply inherent in the nature of having an AS mediate protection?

tsitkov commented 10 years ago

There is relevant section at http://tools.ietf.org/html/rfc6819#section-4.6.7

( For Audit spec per https://docs.google.com/document/d/1h8Hq6bJVW3_l7DE4U45uipIKsFBYWf6sf8HfH0UjHwI it is mentioned in Section 3 Audit Log Parameters under endpoint_uri)

On Aug 27, 2014, at 11:46 AM, Eve Maler notifications@github.com<mailto:notifications@github.com> wrote:

When a (100% untrusted) client first approaches a resource, if it's UMA-protected, the first thing an RS will do is return the as_uri where the client can engage in flows to try and gain access. Does revealing the AS location compromise privacy in any way? For example, if the RO (let's assume it's a human) runs their own AS, this information could uniquely identify that person. Is that a problem? Is it simply inherent in the nature of having an AS mediate protection?

— Reply to this email directly or view it on GitHubhttps://github.com/xmlgrrl/UMA-Specifications/issues/107.

xmlgrrl commented 7 years ago

Let's invite comment on this question before we close the 2.0 spec.

agropper commented 7 years ago

This relates directly to the reason FIDO U2F is in play. From a privacy engineering perspective, as with U2F, it's sub-optimal to presume that a longitudinal relationship with an RS domain implies the ability to correlate with other domains. In theory, a RO could specify a separate AS domain for each RS domain just like they specify a separate public key for each U2F domain.

In practice, domains are costly to set up and maintain. Some cloud services will step in as intermediaries (I've heard Microsoft express an interest in playing this role as a function of Azure) and solve this problem outside of UMA but I'm not sure that's the optimal solution.

I think this issue needs to noted in any privacy analysis of our upcoming spec.

Adrian

On Wed, Jan 4, 2017 at 2:46 PM, Eve Maler notifications@github.com wrote:

Let's invite comment on this question before we close the 2.0 spec.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/KantaraInitiative/wg-uma/issues/107#issuecomment-270467608, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIeYYmg8CtaM3ZEvmmHpINQn8c4JiBKks5rO_cVgaJpZM4Cbx7T .

--

Adrian Gropper MD

PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

xmlgrrl commented 7 years ago

Justin came up with some nice wording for the concern in cases of using a personal AS in the HEART UMA+FHIR dated 27 Feb 2017:

"Since the initial request for a resource is made in an unauthorized and unauthenticated context, such requests are by definition open to all users. The response of that request includes a pointer to the authorization server to query for an access token and present claims. If it is known out of band that authorization server is owned and controlled by a single user, or visiting the authorization server contains other identifying information, then an unauthenticated and unauthorized client would be able to tell which resource owner is associated with a given resource. In the FHIR API, this means that a client would be able to discern which patient a given record is for without being authorized for that information and without the resource server giving that information explicitly."

All this except for the last sentence would perhaps be a good idea for us to add in explaining the concern.

jricher commented 7 years ago

Submitted in #286

agropper commented 7 years ago

As I read this paragraph, the assumption seems to be that a RS could / should register patient-level resources using a pseudonym for the patient rather than the patient's ID number, SSN, email, or other long-term or external identifier. Nonetheless, in many cases, the (FHIR) resource will include a PatientID.

The patient specifies her AS (de-referenced from the patient's email address using WebFinger perhaps) to the RS and Alice's AS might or might not be personal. The unauthorized and unauthenticated client now has a resource and an AS endpoint. If the AS is a shared AS, then relatively little information is leaked. If the AS is a personal AS, then that address might or might not be used by Alice across multiple RSs.

A privacy problem occurs if, for example, Alice uses her email address to allow convenient discovery of her personal AS because now a resource server approached by anyone effectively confirms that Alice is a patient at the RS. To avoid this, Alice would need the option of pointing to her personal AS directly, without a WebFinger lookup and Alice would be responsible for making sure that particular way of getting to her AS was not reused at any other RS.

Does this cover the issue?

Adrian

On Mon, Feb 27, 2017 at 4:56 PM, Eve Maler notifications@github.com wrote:

Justin came up with some nice wording for the concern in cases of using a personal AS in the HEART UMA+FHIR https://openid.bitbucket.io/HEART/ dated 27 Feb 2017:

"Since the initial request for a resource is made in an unauthorized and unauthenticated context, such requests are by definition open to all users. The response of that request includes a pointer to the authorization server to query for an access token and present claims. If it is known out of band that authorization server is owned and controlled by a single user, or visiting the authorization server contains other identifying information, then an unauthenticated and unauthorized client would be able to tell which resource owner is associated with a given resource. In the FHIR API, this means that a client would be able to discern which patient a given record is for without being authorized for that information and without the resource server giving that information explicitly."

All this except for the last sentence would perhaps be a good idea for us to add in explaining the concern.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/KantaraInitiative/wg-uma/issues/107#issuecomment-282868841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIeYQqkKsnxOfWa6IcSiNcXmZeISAF5ks5rg0akgaJpZM4Cbx7T .

--

Adrian Gropper MD

PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

jricher commented 7 years ago

@agropper Your proposal doesn't help the issue at all. It has nothing to do with how the RO specifies the AS to the RS. The URL doesn't have to have long term external identifiers (it shouldn't anyway!) but it needs to be unambiguous to the RS. Even if it's not FHIR, there could be correlation of data.

Let's say I have a client that makes a call to https://resource.example.org/foo/bar, which then points my client to https://authserver.example.net/ as part of its response -- what we've done here is an essential part of the UMA protocol, and that's to establish a link between the RS at that particular resource URL and the AS. It tells us that this is a valid resource, that it has been protected, and that that protection is managed at https://authserver.example.net/. This alone doesn't tell me anything, but what if I knew that https://authserver.example.net/ was a personal AS used by exactly one person? Then I'd know who https://resource.example.org/foo/bar was associated with. Or what if the AS is run by a group that has particular membership, such as employees of a company, students at a school, or customers of an insurance agency. Any of these leaks affiliation info, such that https://resource.example.org/foo/bar represents someone who works at the company that runs https://authserver.example.net/. For sensitive information this may be undesirable, but it's a fundamental issue with how the discovery portion of the UMA protocol works and it ought to be called out.

agropper commented 7 years ago

@jricher I don't see any difference between what I said and what you just wrote above.

If Alice never uses https://authserver.example.net/ at any other RS, what information has been leaked to the client?

jricher commented 7 years ago

Alice doesn't need to use https://authserver.example.net/ at any other RS in order for the client to know that there's a correlation.

To repeat what I said on the call: I am not talking about an RS correlation attack. I am talking about leaking information to the client in the context of a single call to an RS as I described above. I don't even have to follow the UMA protocol after that first call -- I just need to get that "this RS URL is tied to this AS" in order to get the information that leaks here.

What you're proposing tries to solve an RS correlation attack where the same AS is used at multiple RS's and those RS's know and/or do something about that. I'm not talking about that at all (and your solution doesn't necessarily help that either, but that's another story).

agropper commented 7 years ago

You are making a distinction without any difference. If the AS in "this RS URL is tied to this AS" is never used anywhere else then what can the client do with that information? It's as if the AS URI was a globally unique nonce or a public key fingerprint for a keypair that was never reused outside of the particular RS relationship.

jricher commented 7 years ago

There's a huge difference -- you're the one conflating two different situations.

If the attacker (and client-user) can know which resource URLs map to which AS's, and the attacker knows something about the people who use a particular AS, then the attacker can deduce something about the people who are represented by the given resource URLs. If the AS has a single user, then the attacker knows exactly whose resource URL that is. If the AS has a small population of users, then the attacker knows the resource owner fits some aspect of that group -- employment, customer status, etc.

The attacker here learns all of this information by querying a single resource URL and making that map. Can they learn more if there are also other resource servers using the same thing? Sure. But we're not talking about that. We're talking about the fundamental privacy issue of learning the RS->AS connection with no authentication. This is a fundamental part of UMA (both v1 and v2).

None of this requires more than one resource server. It doesn't even require more than one resource. You're talking about a different problem.

agropper commented 7 years ago

In "and the attacker knows something about the people who use a particular AS," you are implying that "people" is Alice and a collection of other patients that all use the same https://as.plannedparenthood.org AS so that the client can deduce that a young woman is a patient at the RS. That is more of a problem when Alice is not using a personal AS and indeed it is a problem with UMA v1 and v2 different from correlation.

So maybe we've converged. There are two different privacy issues:

xmlgrrl commented 7 years ago

I think it's rather the opposite. The fact that the RS hands over as_loc to the client for AS discovery, in the case of a personal AS being used, may possibly leak that the resource belongs to Alice through other out-of-band clues (such as alice-smith-authz-server.com or whatever?) even if the resource URL didn't give this fact away. In the case of a big SaaS provider serving lots of users, it wouldn't necessarily give away this information.

jricher commented 7 years ago

@xmlgrrl Opposite of what, exactly? Adrian and I are discussing two very separate attack vectors.

agropper commented 7 years ago

Indeed we are discussing two separate attack vectors and I think Justin and I are in agreement on both. As Justin said, the worst case, is when Alice uses as.plannedparenthood.org as her AS. That leaks gobs of information about Alice without need for explicit correlation.

As for the SaaS mitigation Eve mentions, this is being considered for Microsoft Azure in the future. They would give Alice's AS virtual machine an opaque identifier that only Azure could link to a particular machine. I don't know how much havoc this would wreak with TLS, but I can imagine Alice having a single AS and multiple (Azure) virtual domains as a way of avoiding both of the attack vectors Justin and I are talking about.

We can't fix all of these problems today. My only point is that HEART MUST allow Alice to specify her AS.

Adrian

On Wed, Mar 1, 2017 at 4:12 PM, Justin Richer notifications@github.com wrote:

@xmlgrrl https://github.com/xmlgrrl Opposite of what, exactly? Adrian and I are discussing two very separate attack vectors.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KantaraInitiative/wg-uma/issues/107#issuecomment-283471577, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIeYWSZ2RgqEQFa8i4ZPr1QKY9pYlxLks5rhd8igaJpZM4Cbx7T .

--

Adrian Gropper MD

PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

xmlgrrl commented 7 years ago

I'm considering the new wording in #286 to have done the job by spelling out the vulnerabilities. It's the readers' job at this point to mitigate... If anyone wants to dispute, they can reopen the issue. :-)