DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
636 stars 298 forks source link

Feature Request: Apollo Gateway / Federation #1057

Open patrick91 opened 4 years ago

patrick91 commented 4 years ago

Hi there! We are using datadog and apollo server and we get some tracing information already, which is super cool!

We are also using Apollo Gateway for Apollo Federation, but don't really get much information on that. I wonder if would be possible to integrate support for the gateway as well.

It seem that it might be possible by customising the base datasources, as explained in this issue: https://github.com/DanielMSchmidt/apollo-opentracing/issues/293

I'm not sure where to start with this but if someone could guide me I'd be happy to help! I'm trying something custom in our service, maybe I bring that to dd-trace-js :)

rochdev commented 4 years ago

I'm not super familiar with either Apollo Gateway or Apollo Federation. Could you describe what would be the expected trace structure for this integration?

patrick91 commented 4 years ago

@rochdev sure! I might coming up with some actual examples later (as I'm adding some traces manually right now), but on a high level I'd say we could log the requests done to the downstream services.

Let's say that we have 3 services:

We could send a GraphQL query to the Gateway that will result in multiple GraphQL queries to these 3 services (they might be in parallel or sequential or both depending on the query).

It would be great to trace these additional queries (the content of the query and execution time). Similar to what we do now for http requests, but with more metadata :)

This is an example of a gateway that's sending metrics to datadog:

image

currently this is only running one query through a local datasource[1], so that's why we have that http request running.

I'm going to do more tests on how it looks with more downstream services running, right now my env is broken :D

[1] a datasource is an abstract on top of a service, normally they are always remote datasources, but in this case I needed a local one

rochdev commented 4 years ago

So say you have service A and service B, am I correct that A would call B by going through the gateway and the gateway generates a new query to B based on the information provided in the query from A? If that's the case, it will probably be necessary to instrument the gateway to get the desired result. Is the gateway written in Node?

patrick91 commented 4 years ago

it's mroe that the gateway calls service a and b :) like this:

image

the gateways is written in node yes and we already get some traces. Would be helpful if I write a complete repo with an small-ish example? I can setup a couple of basic services and a gateway that send data to datadog 😊

rochdev commented 4 years ago

Would be helpful if I write a complete repo with an small-ish example?

That would definitely be helpful, with a description of the behaviour you are currently seeing, and the expected behaviour you would expect from an Apollo Gateway integration.

patrick91 commented 3 years ago

Sorry for the radio silence on this, I need to make a bit of time to make a full example. Might take a few weeks!

mwinstanley commented 3 years ago

Have been running into this, and I have an example of what's not working, if it helps!

Let's say your federated schema in implementing service A looks like this:

extend type Query {
  helloWorld: String
}
type FederatedType @key(fields: "id") {
  id: ID!
  foo: String
}

And in a different service (service B), you have a different schema that contributes to the federated graph:

extend type FederatedType @key(fields: "id") {
  id: ID! @external
}
extend type Query {
  federate: FederatedType
}

If you issue this query to the gateway:

query {
  federate {
    id
    foo
  }
}

You'll see a query that looks like this hit Service A:

query($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on FederatedType {
      foo
    }
  }
}

Observed behavior is that the only graphql.resolve span that is created is for the _entities resolver. Expected behavior is that all resolver spans are created(namely, the resolver for FederatedType.foo).

From looking at the code, I'd guess that this is because the datadog-plugin-graphql wraps fields starting at the query and mutation roots. However, Union types don't have a _fields member, and since there's no other access point to the FederatedType type within ServiceA, it isn't wrapped by the datadog plugin.

kbariotis commented 2 years ago

Hi all, is this issue still active? Are there plans to work on it or is there another solution? Thanks

rochdev commented 2 years ago

Hey @kbariotis, we still need a complete example with the expected resulting trace so that we can support this properly. Unfortunately we have no Apollo expert on the team and it's unclear how an Apollo instrumentation should behave. If you can provide that information then we should be able to implement something to support it. In the meantime, in theory our graphql integration should get you 80% there assuming Apollo Federation still uses the graphql module.

kbariotis commented 2 years ago

thanks @rochdev, we (Grover) will do some investigation next week. Will make sure to report back any useful info we may get. :) 👍

kbariotis commented 2 years ago

@rochdev once again thank you for your response. Just wanted to update on some of my findings on this topic in case anyone else is still looking for this.

The Apollo federation package is built on top of the graphql package (as a router basically for federated GraphQL services) so that's why dd-trace-js is not able to pick those up.

As the OP suggested, the Apollo team has embedded OpenTelemetry to their packages so since DD supports that, would probably be an option. Although I haven't tested it my self.

Maybe worth noting here that the GraphQL foundation has started working on something that could end up being something similar to the Apollo Federation/

I guess the question now is, would it be possible that the dd-trace-js could wrap that Apollo Federation mechanism or should we rely on OpenTelemetry for now? cc @rochdev

Thank you all!

rochdev commented 2 years ago

@kbariotis We do instrument the graphql module, with the theory that it should mean we also support Apollo since it uses that module under the hood as you pointed out. We also have a few tests for Apollo that are passing. If that's not working, it would basically be considered a bug, but we'd need a complete example that reproduces the issue you are seeing along with an explanation of the expected behaviour. If this can be fixed in the graphql instrumentation we'd simply patch it, and if an explicit Apollo plugin is needed then we'd have to schedule work to build it, but nobody on the team has any experience with Apollo so we need some guidance to know how the trace should look and why how it currently looks is incorrect.

kbariotis commented 2 years ago

Thanks @rochdev and apologies for the delay. I guess as the OP stated, it would be nice to have more information of the Apollo Federation execution in regards to all the different services its reaching out to. For example, the main trace graphql.execute could state all the different services it reached out to, or have a different trace for each separate service including its response.

It would be very useful to understand the timings of each request plus their responses that ended up forming the final request/response.

cgn-ca commented 2 years ago

@rochdev I'm using the graphql plugin and it's working well for me, something that does however cause a lot of noise is all the Federated healthcheck calls to the subgraphs.

query __ApolloServiceHealthCheck__ { __typename }

It would be great if there was an option to have a blocklist, similar to that of the express and http plugin

rochdev commented 2 years ago

@cgn-ca Can you open a separate issue for that? It's not really unique to Apollo and there might be other queries where it would make sense to ignore them.

jmvtrinidad commented 1 year ago

Hello @rochdev apollo just integrated a new trace specific for apollo gateway, and I open a new issue for that https://github.com/DataDog/dd-trace-js/issues/3058 thank you