apollographql / apollo-server

🌍  Spec-compliant and production ready JavaScript GraphQL server that lets you develop in a schema-first way. Built for Express, Connect, Hapi, Koa, and more.
https://www.apollographql.com/docs/apollo-server/
MIT License
13.81k stars 2.03k forks source link

Memory leak in ApolloServerPluginUsageReporting plugin #7639

Open gabotechs opened 1 year ago

gabotechs commented 1 year ago

Issue Description

We've been running Apollo server for a while in a couple of APIs, and we have always noticed a memory leak in both, which appeared to be linearly proportional to the number of requests handled by each API.

While investigating the memory leak, v8 heap snapshots where taken from the running servers at two different timestamps, with a distance of 6 hours. The latter heap snapshot was compared to the previous one in order to track what new objects are in the JS heap that where not 6 hours before, and there are thousands of new retained Request-like objects that reference the "usage-reporting.api.apollographql.com" host, and hundreds of TLSSocket new objects that reference this same host.

Some objects that are leaking in the JS memory:

Request-like object ``` body::Object@13534193 cache::"default"@729πŸ—– client::Object@13537293 credentials::"same-origin"@54437πŸ—– cryptoGraphicsNonceMetadata::""@77πŸ—– destination::""@77πŸ—– done::system / Oddball@73πŸ—– headersList::HeadersList@13537317 historyNavigation::system / Oddball@75πŸ—– initiator::""@77πŸ—– integrity::""@77πŸ—– keepalive::system / Oddball@75πŸ—– localURLsOnly::system / Oddball@75πŸ—– map::system / Map@130579 method::"POST"@49427πŸ—– mode::"cors"@84517πŸ—– origin::system / Oddball@67πŸ—– parserMetadata::""@77πŸ—– policyContainer::Object@13537295 preventNoCacheCacheControlHeaderModification::system / Oddball@75πŸ—– priority::system / Oddball@71πŸ—– properties::system / PropertyArray@13537319 redirect::"follow"@53093πŸ—– referrer::"no-referrer"@85507πŸ—– referrerPolicy::system / Oddball@67πŸ—– reloadNavigation::system / Oddball@75πŸ—– replacesClientId::""@77πŸ—– reservedClient::system / Oddball@71πŸ—– responseTainting::"basic"@102749πŸ—– serviceWorkers::"none"@519πŸ—– taintedOrigin::system / Oddball@75πŸ—– timingAllowFailed::system / Oddball@75πŸ—– unsafeRequest::system / Oddball@75πŸ—– url::URL@13537301 ::URLContext@13538143 fragment::system / Oddball@71πŸ—– host::"usage-reporting.api.apollographql.com"@13538145πŸ—– map::system / Map@135759 path::Array@13538147 port::system / Oddball@71πŸ—– query::system / Oddball@71πŸ—– scheme::"https:"@6945πŸ—– username::""@77πŸ—– __proto__::Object@135757 ::URLSearchParams@13538149 map::system / Map@135741 __proto__::URL@135739πŸ—– urlList::Array@13537299 useCORSPreflightFlag::system / Oddball@75πŸ—– useCredentials::system / Oddball@75πŸ—– userActivation::system / Oddball@75πŸ—– window::"no-window"@87117πŸ—– __proto__ ```
TLSSocket object ``` ::system / Oddball@75πŸ—– ::Client@131765 ::Object@13536139 ::InformationalError@13536143 ::system / Oddball@71πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@75πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@75πŸ—– ::system / Oddball@73πŸ—– ::system / Oddball@67πŸ—– ::system / Oddball@73πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@75πŸ—– ::system / Oddball@71πŸ—– ::system / Oddball@73πŸ—– ::system / Oddball@75πŸ—– _SNICallback::system / Oddball@71πŸ—– _closeAfterHandlingError::system / Oddball@75πŸ—– _controlReleased::system / Oddball@73πŸ—– _events::Object@13536133 _hadError::system / Oddball@75πŸ—– _host::"usage-reporting.api.apollographql.com"@131813πŸ—– _maxListeners::system / Oddball@67πŸ—– _newSessionPending::system / Oddball@75πŸ—– _parent::system / Oddball@71πŸ—– _peername::Object@13536141 _pendingData::system / Oddball@71πŸ—– _pendingEncoding::""@77πŸ—– _readableState::ReadableState@13536135 _rejectUnauthorized::system / Oddball@73πŸ—– _requestCert::system / Oddball@73πŸ—– _secureEstablished::system / Oddball@73πŸ—– _securePending::system / Oddball@75πŸ—– _server::system / Oddball@71πŸ—– _sockname::system / Oddball@71πŸ—– _tlsOptions::Object@13536129 _writableState::WritableState@13536137 allowHalfOpen::system / Oddball@75πŸ—– alpnProtocol::system / Oddball@75πŸ—– authorizationError::system / Oddball@71πŸ—– authorized::system / Oddball@73πŸ—– connecting::system / Oddball@75πŸ—– domain::system / Oddball@71πŸ—– encrypted::system / Oddball@73πŸ—– map::system / Map@130053 properties::system / PropertyArray@13536145 secureConnecting::system / Oddball@75πŸ—– server::system / Oddball@67πŸ—– servername::"usage-reporting.api.apollographql.com"@13536131πŸ—– ssl::system / Oddball@71πŸ—– __proto__::Socket@147607πŸ—– ```

Here is a chart showing the memory usage of the last two days for one of the APIs:

Screenshot 2023-07-12 at 09 17 21

The first left half of the chart (the first day) the Apollo server was running with the ApolloServerPluginUsageReporting enabled, and the memory kept increasing linearly, and the last half (the second day), exactly the same code was running but passing the ApolloServerPluginUsageReportingDisabled to the plugins, so that the usage reporting is disabled. In this last case no memory was being leaked.

We are using @apollo/server with version 4.3.0

Link to Reproduction

https://github.com/GabrielMusatMestre/apollo-server-memory-leak-repro

Reproduction Steps

Steps are described in the README.md of the reproduction repo.

This is not a reliable reproduction, as the memory leak might start being noticeable by running the server under heavy load for hours or days, and it needs a properly configured APOLLO_KEY and APOLLO_GRAPH_REF that will actually publish usage reports to Apollo.

trevor-scheer commented 1 year ago

Thanks for the great report, sorry it's taken me a week to get to this. I'll have a closer look tomorrow!

trevor-scheer commented 1 year ago

@gabotechs have you had a chance to look at my PR and try out the built packages?

juancarlosjr97 commented 1 year ago

Any progress on this issue? We have a memory leak and we are trying to identify where it is coming from too, but we have disabled the reporting and still see a memory spike.

trevor-scheer commented 1 year ago

@juancarlosjr97 are you able to create a minimal reproduction? That's certainly interesting that you see the issue with reporting disabled, they may be unrelated.

juancarlosjr97 commented 1 year ago

I will create one @trevor-scheer and share it. The issue has been narrowed down to many unique operations for the same query.

query GetBestSellers($category1: ProductCategory) {
  bestSellers(category: $category1) {
    title
  }
}
query GetBestSellers($category2: ProductCategory) {
  bestSellers(category: $category2) {
    title
  }
}

The $category1 and $category2 could have the same value but they are identified as two different operations. We reverted yesterday to using the same operation and the memory increase issue has been resolved. However, it reveals that the memory used has not been released (nearly 12 hours after the change of the operations) which might point to another issue of garbage collection.

We have tested this behaviour on the Apollo v3 and Apollo v4, with all plugins disabled, and still the same behaviour.

Also, profiling revealed that what is taking more memory from and continuously growing is the graphql library, specifically from the node_modules/graphql/language.

This might be another issue, and if it is, I will raise a separate issue for it if has not been raised already internally by Apollo by then.

We have raised this internally with Apollo and the reference number is 9623.

trevor-scheer commented 1 year ago

@juancarlosjr97 given what you've narrowed it down to so far this does seem unrelated, so a separate issue would be preferred.

How sure are you that:

for the same query

is relevant? Have you ruled out just "many unique operations" by itself? The "same query excluding variables" part would be a surprising twist to the issue.

The graphql/language leads us to parsing and printing from graphql-js so it would be interesting to run the set of queries that make up your reproduction against just the graphql-js parser directly and see if you still see the same issue with GC / memory usage increasing over time.

glasser commented 1 year ago

@trevor-scheer If they're saying that they change the name of the variable ($category1 vs $category2) then I think most of our systems (including usage reporting) will consider them as distinct operations.

trevor-scheer commented 1 year ago

@glasser right, I'm asking for clarification on exactly that. Like I don't think that it matters that the operation is entirely the same minus the variable name. It sounds more like a "many different operations" problem, more generally. I would be surprised if the issue was limited to that specific nuance.

glasser commented 1 year ago

Right, I agree that the problem is likely "many different operations" β€” just that it might not be obvious that we treat those operations as distinct.

juancarlosjr97 commented 1 year ago

Thank you for the replies @trevor-scheer and @glasser. We can confirm at this point, that the memory has been steady since we the consumer changed the query to be identified as the same operation using the same queries.

I will work on reproducing the issue and when I have done it, I will raise another issue with all the details and a repository to clone and replicate the bug, as the memory is not getting released which is the problem

juancarlosjr97 commented 1 year ago

@trevor-scheer and @glasser I created a project with instruction that shows the memory leak issue https://github.com/juancarlosjr97/apollo-graphql-federation-memory-leak. I am going to be raising another issue with all the details of the issue.