StarpTech / apollo-datasource-http

Optimized JSON HTTP Data Source for Apollo Server
MIT License
73 stars 32 forks source link

Request deduplication (memoization) does not occur across concurrently executing frames #29

Open bmowis opened 2 years ago

bmowis commented 2 years ago

This may be more of a feature request than a bug, as 'memoization' is admittedly currently 'working as designed'.

However, my understanding of memoization is that multiple independent tasks should be allowed to share a single common method invocation and its corresponding promised result. So for example, if 6 threads in an execution frame call into a function with the same expectation for results (e.g. Same provided parameters) they will all get the shared result without the need to execute the function 6 times.

However, it looks as though the HTTPDataSource.performRequest() method is not memoizing the Promise until after the result is already received. This means that concurrent, memoized access to an HTTPDataSource API method will still result in multiple simultaneous and redundant outbound calls to the HTTP endpoint. If one of the outstanding goals of memoization is request de-duplication, than this should not be so.

~Line 313 of HTTPDataSource.ts: const responseData = await this.pool.request(requestOptions)

Then, ~ line 336:

      if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, response)
      }

In Apollo Server, this limitation is easily observable in the FIRST query returning a requested entity when multiple fields in the same schema object explicitly use the same resolver (and subsequently, the same DataSource API method).

// Query
query getMovie {
  movie(id: 12) {
    name,
    director,
    year
  }
}

// Schema
type Query {
  movie(id: Int!): [Movie]
}

type Movie {
  name: String
  director: String
  year: Int
}

// Resolver
// For this resolver, assume that the getMovie(id, ctx) method uses an HTTPDataSource implementation
// to fetch a single Movie object
const resolver: Resolvers = {
  Query: {
    movie: (root, { id }, ctx): => ({ id })
  },
  Movie: {
    name: async ({ id }, args, ctx) => {
      const { name } = await getMovie(id, ctx);
      return name;
    },
    directory: async ({ id }, args, ctx) => {
      const { director } = await getMovie(id, ctx);
      return director;
    },
    year: async ({ id }, args, ctx) => {
      const { year } = await getMovie(id, ctx);
      return year;
    }
  }

In the above circumstance, we have proper, independent field-level resolvers. However, each field resolver here will in fact invoke their own outbound HTTP call (with the same input parameters and expected result), missing both the memoized results LRU map and the cache, despite the fact that multiple calls are duplicitous and unnecessary.

If feasible, sharing function invocations - not just function results - would improve memoization 'hits' in concurrent-access scenarios. I believe a working approach to this could be to memoize the Promised result before it is actually received, rather than just the finished result itself. Like this:

    const responsePromise = await this.pool.request(requestOptions)
    if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, responsePromise)
    }

    // Now we can await the result and finish the rest of performRequest() that requires an actual result.
    const response = await responsePromise;

If further explanation or an example is required, let me know.

bmowis commented 2 years ago

For the record, I am currently working around this issue in the above example by calling the getMovie() method in the parent/root resolver (child fields need to wait for parent fields to resolve to scalars prior to execution, so children will have access to the parent's memoized result by the time they attempt their requests).

However, this solution seems bloaty and may not be possible in more complex modelling/querying scenarios. For instance, two separate root-level queries may be executing and accessing the same HTTPDataSource API, and they should be able to leverage a memoized promise so that there is only a single outbound HTTP call between the two queries.

llc1123 commented 2 years ago

I think we should use DataLoader for batching and caching in a single request. I am writing something like this as a workaround. Maybe I'll make a PR for this later if I have time.

class Example extends HTTPDataSource {
  private memoizeLoaderCache = new Map<string, DataLoader<string, unknown>>()

  protected async memoizedGet<T = unknown>(
    path: string,
    query: unknown = {},
  ): Promise<T> {
    // generate a unique key for each request like onCacheKeyCalculation
    const cacheKey = path + ' ' + JSON.stringify(query)

    // check if the data loader exists
    let loader = this.memoizeLoaderCache.get(cacheKey) as DataLoader<string, T>
    if (!loader) {
      // create a data loader for the cache key
      loader = new DataLoader<string, T>(async (keys: readonly string[]) => {
        const { body } = await this.get<T>(path, { query })
        return keys.map((_) => body)
      })
      this.memoizeLoaderCache.set(cacheKey, loader)
    }

    // register your request
    const res = await loader.load(cacheKey)
    return res
  }
}
bradleymorris commented 2 years ago

https://github.com/StarpTech/apollo-datasource-http/pull/37 addresses this. #37