GraphQL: Sanitizing GQL document

ameyagholkar commented 3 years ago

Describe the bug

Disclaimer: I'm not sure if this is a bug or something that's missing. Ramping up on dd-trace, so if I'm missing something please don't hesitate to let me know 😄

We have straightforward setup for dd-trace where it's initialized in our Apollo server entrypoint. We're using apollo-server-express. Tracing worked out the box with graphQL spans being initialized and showing up correctly in the DataDog webUI.

In our testing, we did observe that there might be cases where we'd potentially need to sanitize the graphQL document by scrubbing PII data. These are only applicable to cases where we'd make one-off queries/mutations from an browser based apollo client.

For example; we'd like to scrub out the clear text password in this mutation.

mutation AuthenticateUser {
  authenticate(input: { username: 'foo', password: 'bar' } ) {
    userId
    name
  }
}

I've been trying to figure out what's the best place to intercept the trace and sanitize the document but I couldn't find one. Adding an apollo-server plugin to hook into request lifecycle doesn't let me get access to the underlying graphql plugin's spans. I'm assuming this is by design since I'm dealing with the server context in the plugin and not the underlying graphql's context.

I know of and have used the execute hook from the graphql plugin and it seems like a good place to do such modifications but there seem to be no hooks for parse and validate.

So, is there a way around this OR are implementing parse and validate hooks the best bet? If I'm missing something here, do let me know 👍🏼

Environment

graphQL

Operation system:
- macOS
Node version:
- v12.18.3
Tracer version:
- v0.30.3
Agent version:
- v7.25.1

rochdev commented 3 years ago

By default, the GraphQL plugin is supposed to compute a signature for queries that should be removing parameters to group similar queries together. If that's not the case, it's possible that there's a bug in the plugin. Can you share what the resource name looks like in the UI for this query? Be sure to remove any sensitive data if it went through.

In general however, and regardless of this issue and whether you are using APM or not, I would recommend not passing data (especially sensitive data) directly in queries, and instead pass them as variables.

For example:

mutation AuthenticateUser($input: AuthenticateUserInput!) {
  authenticate(input: $input ) {
    userId
    name
  }
}

You can find more information about variables in the GraphQL documentation.

ameyagholkar commented 3 years ago

Thanks for responding @rochdev

I see that the resource name of the query is displayed as:

mutation AuthenticateUser {authenticate(input: { } ) ..

So it does look like the plugin removes the relevant input values. However, the graphql.source which contains the graphql document still contains the entire query source with the inputs. That is the part I'm interested in sanitizing.

Furthermore, let me clarify a couple of points:

This trace was generated by our test queries in Apollo's GraphQL browser-based playground for our deployed environment (similar to GraphiQL). Since it's a playground for queries, we were testing traces for the auth query by inlining the password without using variables. That's the reason we're seeing passwords in clear text in traces under graphql.source.
For all other clients, we always use variables -- which are easy to sanitize given the variables property in the graphql plugin config.

Given these points, is there any way to sanitize traces from such queries made from such a GraphQL playground/explorer where the consumers might not necessarily use variables?

rochdev commented 3 years ago

Thanks for the clarification, I didn't consider the graphql.source tag and the ad-hoc queries use case. In order to alter the tag you would indeed be using span hooks generally speaking, but as you pointed out, there are no hooks at the moment for validate and parse so these would need to be implemented. I wonder if an option to also use the signature in the tag would make sense, because otherwise it would still end up in resolvers as well, so hooks would be needed for these too.

ameyagholkar commented 3 years ago

Thanks @rochdev

I experimented by looking to see if I could hook into the apollo server lifecycle to block queries/mutations not using variables. I could block the graphql.execute spans from collected.

But alas, it's much harder to block the graphql.parse and graphql.validate spans from being completed and collected because all the information we need to check if variables are used or not is generated after the parse and validation steps are completed.

So, two questions:

Is there any way for me to get access to any spans being created by the graphql plugins without the plugin's hooks?
Is implementing the validate and parse hooks into the graphql plugin on your roadmap already?

rochdev commented 3 years ago

Is there any way for me to get access to any spans being created by the graphql plugins without the plugin's hooks?

Not without using the tracer internals directly unfortunately.

Is implementing the validate and parse hooks into the graphql plugin on your roadmap already?

It's not explicitly on the roadmap but generally speaking we'd like to have hooks for every single span. The difficult one for graphql specifically will be resolve. parse and validate should be fairly straightforward to implement as they are very similar to execute.

thehideki81 commented 3 years ago

I have created a PR about adding parse and validate hooks.

https://github.com/DataDog/dd-trace-js/pull/1495

Would appreciate help on getting this forward.

EDIT: Fixed tests for PR.

DataDog / dd-trace-js

GraphQL: Sanitizing GQL document #1261