Cache poisoning bugs with field aliasing and @client

Issue Description

The Bug

If an attacker can execute arbitrary GraphQL queries via an Apollo Client with a cache, and the schema being queried contains objects on which the attacker can control fields, then they can cause the cache to store attacker-controlled data. In the worst case, this can lead to code execution if the content served in the GraphQL API is used in some unsafe way (e.g. inserted as raw HTML into a page).

There are two separate bugs that I know of:

If a query aliases __typename and/or id, then the cache will use the aliased field values instead of the actual __typename or id. If an attacker can control two fields on any queryable GQL object, then they can use this to effectively replace any object in the cache with a different one, though the fields on the stored object will be normalized correctly.
If a query queries a field and later references the same field with an @client directive as an alias of a different field, then the unnormalized second field will be stored as the value of the first field. (Credit to @luker983 for discovering this bug.)

I haven't looked through the code that closely, but I have educated guesses as to the root causes for both of these:

The cache key is computed before the result object is normalized. If it were normalized first, it would receive the correct id and __typename, assuming they were queried.*
If the attack uses a: b @client, then the contents of a aren't normalized when they are set to b because the query doesn't contain a subselection for b. This seems like correct behavior to me; I think the actual fault is that the query should get rejected at validation time since it defines the same field multiple times, but I suspect that it isn't because the @client directive is causing the field to get stripped before validation.

*As an aside, I think there's another caching issue (even with proper normalization) with schemas that have both id and _id defined for a type. It's a bit contrived, but I'm imagining a scenario where id is system-defined, while _id is set by the user. By loading only __typename and _id, an attacker could overwrite the cached object defined by a __typename/id pair by setting their controlled _id to the id of the target object.

How serious is this?

Triggering the bug in the first place requires a GQL query injection. Unless there's some common case for user-supplied GQL fragments that I'm not thinking of, you'd have to be pretty far off the beaten path to introduce this kind of vulnerability in the first place. This renders this vulnerability extremely unlikely to appear in the wild.

Once triggered, this bug can in theory lead all the way to XSS if some content served via the GraphQL API is being inserted as raw HTML into a page. Even if there isn't any such content, this bug could still be used to do content injection in order to convince a victim to take a particular action on the attacked site.

Notes

This bug was featured in Plaid CTF 2023 in the problem Davy Jones' Putlocker. In this case, the injection was introduced via a custom implementation of the gql template tag that attempted to allow inline variable interpolation, but which handled it unsafely in some cases. This allowed the attacker to overwrite results from the GraphQL API, which included markdown being rendered server-side and inserted into the page as raw HTML.

Link to Reproduction

https://codesandbox.io/s/apollo-client-cache-poisoning-5zhuxb

Reproduction Steps

Note that when the attacker doesn't run a query, the output is John Smith as expected.
Click on the link for Issue 1. Note that the output changes to pwnd 1! because the aliased __typename and id from attackerControlledObject are used to compute the cache key.
Click on the link for Issue 2. Note that the output changes to pwnd 2! because the @client directive caused the cache to store a completely unnormalized attackerControlledObject in the cache object for ROOT_QUERY at the key person({"id":"1"}).

Triggering the bug in the first place requires a GQL query injection. Unless there's some common case for user-supplied GQL fragments that I'm not thinking of, you'd have to be pretty far off the beaten path to introduce this kind of vulnerability in the first place. This renders this vulnerability extremely unlikely to appear in the wild.

I don't want to downplay the work that went into this, and I will look into this further and discuss it internally. At the same time, I am trying to understand the impact of this and am trying to find a situation where this could actually happen.

So, let's look at some situations.

Prerequisite scenarios:

The attacker can execute JavaScript and execute queries that way. At that point, the attacker can already execute JavaScript, and this attack vector is meaningless.
A website has some kind of user input that allows for users to enter their own queries. The user had to essentially "attack themselves", so social engineering would be involved here. At that point, the attacker could also get the user to copy-paste JavaScript into the browser console with similar effort, but much higher reward.
A website accepts URL parameters that will directly be injected into a GraphQL query. This is the only case that makes sense to me, but honestly I've never heard of any website doing something like this. It's not impossible that a site like this exists though.

Follow-up requirements:

Data from the server is immediately injected into HTML. This in itself is highly unlikely, assuming the site is using some kind of framework and not actively circumventing security features of said framework (meaning, it's using something like dangerouslySetInnerHTML={{__html: dataFromGraph}}
In addition to all that, the attacker needs some kind of "write access" to the dataset that is being queried again. (This could happen.)

So, we have two requirements that in itself are extremely unlikely that could lead to some kind of XSS - although Apollo Client's role in this essentially boils down to "an attacker can replace one string with another".

Am I getting this right so far?

Follow-up-thought:

Would you consider this attack a very complicated version of a simpler attack injecting a query in this form?

query myQuery {
  expectedFieldName: attackerControlledFieldName
}

It seems to me that any website that would allow user-controlled queries to be injected would have to guard against an attack like this in the first place - by making extra sure that no Graph data ever ends up in unescaped HTML, and probably by disallowing aliasing in user-provided fragments/queries.
Assuming such guards would be in place, the attack you describe would be impossible as well, right?

The prerequisite scenarios you described under which this attack could be carried out are correct. That's the main reason I felt comfortable using this for a CTF and then reporting it as a public issue — you'd have to be doing something pretty wild in order for this to even be a possibility in the first place. I agree that the third scenario is really the only situation in which this could actually be usable by an attacker without the presence of an easier vector and/or higher reward.

The follow-up requirements are also mostly accurate, but I will note that in the setup showcased in the problem, the server was building the HTML string on the server and returning it as an HtmlString scalar (i.e. returning an object of the form { __html: string }), and that the attacker only had the ability to write fields that were not rendered as HTML (e.g. the username of their account). At a glance this seems like a reasonably safe way to do this since it requires the server to explicitly opt in to displaying data from the API as raw HTML.

I would say that "an attacker can replace one string with another" isn't quite an accurate representation of the impact, because it's not just strings, but rather entire objects that can be replaced. With Issue 1 you're limited by the set of fields that exist on the other types in the schema and your control over them, but with Issue 2 you can use aliasing to construct almost any structure you want (assuming the attacker has sufficient control over some queryable data). This could be used to replace strings, but could also in theory be used to cause a type confusion issue somewhere down the line. But yes, unless the data from the GraphQL API is eventually being used in some unsafe way or otherwise influences some unsafe operation, the worst that could happen is content replacement.

Regarding the follow-up thought: I'm not totally certain what the intended attack scenario is with that query. If the idea is that the attacker has injected the aliased name in and that the results of this query are going to be directly displayed to the user, then I'd call that a different vulnerability since it's just injecting into the query directly. The key component of the cache poisoning exploit is that the results of the attacker-controlled query need not actually be used by the application in any meaningful way at the time of the query. In fact, in the CTF problem, the only query that the attacker could inject into didn't actually render any HTML content on the page; instead the attacker could use it to poison the cache for a query that would be executed later that did have this unsafe operation. The codesandbox example also assumes this constraint by simply discarding attackerData, and only displaying the results from the GET_PERSON query, which the attacker has no control over.

apollographql / apollo-client