Different federated types with the same id error: Cannot redefine property: __typename

apollographql / federation

🌐 Build and scale a single data graph across multiple services with Apollo's federation gateway.

https://apollographql.com/docs/federation/

Other

664 stars 249 forks source link

Different federated types with the same id error: Cannot redefine property: __typename #2510

Closed brettski closed 1 year ago

brettski commented 1 year ago

Issue Description

While updating our federated GraphQL graph, after updating one of the sub-graphs to Apollo Server v4, an error started to occur on federated types.

We have two "user" types which allow access to a different subset of fields. Thinks PublicUser view and InternalUser view. After updating to Apollo Server 4 if user id's are requested as type PublicUser, subsequent request of those id's as InternalUser result in the error Cannot redefine property: __typename.

This change in functionality has been determined to happen with the subgraph service supporting those types are running Apollo Server v4. If the subgraph service is rolled back to v3, queries run as expected. This confirmation with codesandbox doesn't seem possible, easily.

This seems like a operational bug to us, but look to the Apollo team for confirmation. We tried to simplify this to make it easier to reproduce. We hope what is presented here makes sense.

Link to Reproduction

https://codesandbox.io/p/sandbox/brettski-resolvetype-issue-z98clc

Reproduction Steps

Using the codesandbox link above setup the following two queries:

query products {
  products {
    __typename
    id
    name
    reviews {
      __typename
      id
      content
      user {
        __typename
        id 
        name
      }

    }
  }
}

query products2 {
  products {
    __typename
    id
    name
    adminReviews {
      __typename
      id
      content
      user {
        __typename
        id 
        name
      }

    }
  }
}

Run products query
- notice successful query results
Run products2 query
- notice query runs but errors on each instance of adminReviews.user, returning null (errors shown at bottom of result)
Reload the gateway
run products2 query
- notice successful query results
run products query
- notice query runs but errors on each instance of reviews.user, returning null (errors shown at bottom of result)
If you remove the extended fields (leaving the user.id value only) from the query, the query obviously succeeds as there was no request made from the other subgraph
```
query products {
products {
__typename
id
name
reviews {
  __typename
  id
  content
  user {
    __typename
    id 
    # name
  }

}
}
}
```

trevor-scheer commented 1 year ago

Thanks for providing a reproduction @brettski, this seems pretty quirky so that's should be pretty helpful for tracking it down.

I forked your sandbox and added a second gateway + a users service running ASv3 and didn't see a difference in the behavior. https://codesandbox.io/p/sandbox/brettski-resolvetype-issue-forked-zw8xv1

Am I approaching the reproduction correctly or did I misunderstand something?

FWIW, the stacktrace is pointing me here as the origin of the actual error, which indicates to me this might be an @apollo/subgraph problem and not an @apollo/server problem? Did you upgrade those versions at the same time by chance? https://github.com/apollographql/federation/blob/972580b412dbfc68fcf275aadbbcd871539100ca/subgraph-js/src/types.ts#L65-L67

brettski commented 1 year ago

Thank you for looking into this @trevor-scheer

FWIW, the stacktrace is pointing me here as the origin of the actual error, which indicates to me this might be an @apollo/subgraph problem and not an @apollo/server problem? Did you upgrade those versions at the same time by chance?

@apollo/subgraph was a minor version upgrade, 2.2.1->2.4.0. I tried rolling this back only and the issue still persisted. Prior to updating to @apollo/server v4, our supergraph was running @apollo/gateway v2, @apollo/subgraph v2, and @apollo/server v3 across the services.

Between trying to rollback the @apollo/subgraph version and switching between our master and v4 branch on our "members" service only, is how I figured it was with @apollo/server. Unless it's a combination of the two, which seems unlikely. I can't think of what else in play may be affecting this. Granted my knowledge doesn't go very deep here. Link to the package.json diff in our member service between master and v4 branch: https://github.com/ThatConference/that-api-members/pull/149/files#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519

I forked your sandbox and added a second gateway + a users service running ASv3 and didn't see a difference in the behavior.

I tried it too, yeah, strange it works the same. Only [small] difference is we're using apollo-server-express in v3.

brettski commented 1 year ago

Any additional thoughts or findings on this issue?

trevor-scheer commented 1 year ago

Nothing yet @brettski, I'll put together a test case and bring it up with the team.

I'm still not confident about the part where this issue is dynamic (restarting and order of queries affects which query exhibits the bug). I think that would imply some statefulness to the query planner which would be surprising to me.

I'm also unsure this is related to ASv3/4 based on my fork above, and that would be a pretty surprising / nuanced bug. I should be able to narrow this down to a @apollo/query-planner + @apollo/subgraph issue and go from there.

If you can provide me with any additional evidence that this is specific to an Apollo Server version upgrade or that it is actually a dynamic problem, either of those pieces of information will be helpful.

brettski commented 1 year ago

If you can provide me with any additional evidence that this is specific to an Apollo Server version upgrade or that it is actually a dynamic problem, either of those pieces of information will be helpful.

@trevor-scheer I'd love to, but honestly I am not sure how to go about it. There are a good number of changes needed to get the services from v3 to v4, so doing only some of the upgrade I don't think is really possible.

trevor-scheer commented 1 year ago

Ok! @brettski got some progress on this one. I have narrowed it down to @apollo/subgraph in a test case, I used your schema and the resulting entities fetches from the queries you provided. I put up a PR just for demo purposes: https://github.com/apollographql/federation/pull/2522

The two separate commits should be helpful in "explaining" the problem a bit. Right now, there are "long-lived" objects that exist across multiple requests in the codesandbox (and the first commit of my reproduction), which explains one of my hypotheses:

I think that would imply some statefulness to the query planner which would be surprising to me.

These objects are being written to, and then re-written to. In the second commit, I ensure there are "fresh" objects whenever they're accessed and the issue disappears. The error is being thrown here, and I guess based on our usage of Object.defineProperty it throws an error when we try to rewrite the __typename property with a different value.

Generally speaking, I think this shared state across requests (particularly w.r.t. values obtained during execution) is a bit of a smell, though I'd be curious to hear if and why that's necessary for you (or if it's, say, an unintentional accident). That being said, I'm not sure we necessarily need to throw here, we could just overwrite the __typename field and call it a day I think.

Let me know what you think. I'm open to the latter idea if you've got a legitimate use case where this is a useful pattern, but otherwise maybe it's surfaced something that should be resolved on your end (maybe we can warn here or provide some better DX)?

brettski commented 1 year ago

At @trevor-scheer exciting progress here for sure!

Generally speaking, I think this shared state across requests (particularly w.r.t. values obtained during execution) is a bit of a smell, though I'd be curious to hear if and why that's necessary for you (or if it's, say, an unintentional accident). That being said, I'm not sure we necessarily need to throw here, we could just overwrite the __typename field and call it a day I think.

Im not quite sure if I'm following this, it is the shared state across requests that we don't want, so, it isn't necessary to maintain. What are w.r.t values you refer to?

If I have a query say like this containing both of the different types with same underlying "user":

query products {
  products {
    __typename
    id
    name
    reviews {
      __typename
      id
      content
      user {
        __typename
        id 
        name
      }     
    }
    adminReviews {
    __typename
    id
    content
    user {
      __typename
      id 
      name
    }
  }
}

This too errors when run saying that __typename cannot be redefined (for adminReview.user). So not necessarily on a subsequent execution.

trevor-scheer commented 1 year ago

@brettski sorry if I was unclear. The shared state I'm referring to is not within @apollo/subgraph or any library code, they are the objects being returned by the resolvers i.e. userData in the sandbox

const userData = [
  { id: "1", name: "George", number: "1234", email: "g@example.com" },
  { id: "2", name: "Tina", number: "2345", email: "t@example.com" },
  { id: "3", name: "Mandy", number: "3456", email: "m@example.com" },
];

These objects are having __typename written to them (immutably), and then having a different __typename written to them (which triggers the error).

For now I'd consider reusing these objects to be a smell, or rather something that should maybe be avoided in production code, though if there's a good explanation or reason for doing so then I'd be more inclined to consider this a bug on our end. For example, one solution to this problem (via the test case or sandbox reproduction) would be to have two separate arrays: one for Users and a separate one for PublicUsers, even if they contain the same data.

Does that make sense? So I guess next I'm interested to hear if our contrived example which currently reuses identities to be actually applicable to a real use case or if that should be avoided to begin with.

brettski commented 1 year ago

@trevor-scheer So in our production case this data comes from the same DataLoader source in the resolver, are you saying this is a smell? We've run this pattern for the last three years, what is different about the libraries which is causing this an issue now (if you know)?

I did try this in the code sandbox and see that with the two different data sources the error does not occur. It does seem to be a smell though that a new/different datasource is needed for each type which could occur across an entire supergraph (still thinking through this). Isn't that the basis around having a __resolveReference function for federation, to assign types. Why should backing data source be a factor in it?

Playing around a bit with the sandbox this only seems to be an issue when running the queries through federation. If a query is done solely on a sub-graph with the two different types using the same data source, the error doesn't occur.

To pull two different data sources for each type (e.g. Users, PublicUsers) would be a pretty major rewrite and may additional fetches to the backing datasource. I am not sure what would be gained by a different data source for each similar Type. Isn't using different types one method of controlling access to your data in GraphQL? I would think it would pretty annoying for and end user of an API having to figure out which individual fields they are allowed to include in their query over access to an entire type.

@csell5, you have feedback or comments on this?

trevor-scheer commented 1 year ago

@brettski I think it's reasonable to use the same dataloader for resolving references, but I do wonder if having them return the same actual object in different resolvers is a little funky. So the User.__resolveReference and PublicUser.__resolveReference end up with handles to the same objects in memory.

All this being said, I do think we have a regression here since upgrading is surfacing the issue. My next step will be to take this test and bisect over the @apollo/subgraph versions and try to figure out what might've caused it. Just FYI, I'm out until Monday but I should be able to take a look then.

brettski commented 1 year ago

@trevor-scheer The reason each user object for the two may be the exact same is that we use Firestore on the backend so the data provided will be the same record shape. In addition if the the same Dataloader is used for the the two types __resolveReference, and those two types are in the same query, there is a really good chance the Dataloader itself would use the same object as the ID's would match.

And thank you. We do appreciate all your time reviewing this issue.

trevor-scheer commented 1 year ago

Hey @brettski. I think the error we're seeing / reproduction I've made are a bit of a red herring. My reproduction exhibited the same behavior all the way back to v2.0.0 and there weren't any obvious code changes in @apollo/subgraph that would've led me to believe we broke something there.

On the other hand, I took a much closer look at the diff you sent me and dropped a comment on the PR. When you get a chance, can you implement the change I've suggested there and let me know how it goes?

brettski commented 1 year ago

@trevor-scheer thank you for taking the time to review our PR. We truly appreciate your aded effort. You discovered a large oversight on my part where I was thinking that the object were being recreated with each request, which they weren't.

This finding is two-fold for me though. For one, I'm grateful the error happened as not having the dataloader instantiate on each request could have had some messed up errors down the line. The second part though, is that I don't understand why Apollo federation cares if an [the same] object is used across multiple types. As in the code sandbox example used, if the same object in memory is used for two different types (e.g. private and public view types) why should Apollo federation care, why doesn't it like to set the different __typename on the other object? This is something @csell5 and myself were discussing and trying to figure out why it matters.

trevor-scheer commented 1 year ago

@brettski glad to hear it worked out and thanks for reporting back.

Unfortunately I don't have a good answer for you. That constraint dates all the way back to the initial commit of this file (which is transposed from a private repo) so I can't lean on any helpful 4+ year old git history to offer an explanation.

There's also 0 test failures introduced if I change that code to the snippet below which doesn't throw / use Object.defineProperty, so there aren't any obvious test cases to show why it became or always was that way in the first place.

function addTypeNameToPossibleReturn<T>(
  maybeObject: null | T,
  typename: string,
): null | T & { __typename: string } {
  if (maybeObject !== null && typeof maybeObject === 'object') {
    // @ts-ignore
    maybeObject.__typename = typename;
  }
  return maybeObject as null | T & { __typename: string };
}

It doesn't seem unreasonable to me that objects with the same contents being used as different types should have their own identity. In this case, the object is being mutated to add __typename to it. It seems possible to contrive an example where the code above is problematic, so the use of Object.defineProperty may have just been defensive. If this does resurface as problematic for you I'd be willing to revisit.