apollographql / apollo-feature-requests

šŸ§‘ā€šŸš€ Apollo Client Feature Requests | (no šŸ› please).
Other
130 stars 7 forks source link

Minification of Query when useGETForQueries: true #386

Open DrewML opened 4 years ago

DrewML commented 4 years ago

Howdy!

Has any consideration been given to eliminating unnecessary symbols/whitespace from queries, prior to them being sent over the network? When useGETForQueries is enabled, this would help prevent hitting URL size limits.

I opened apollographql/apollo-client#6863 as a partial improvement, but the size savings would be much more significant if queries were stripped down prior to being encoded.

I attempted to make this change in an application by using the strip option in babel-plugin-graphql-tag, which uses the stripIgnoredCharacters utility from graphql-js. Although the string inlined in the bundle shrinks, it seems like apollo-client always serializes from AST, and uses the standard formatting from graphql-js's printer

Possible Solutions

  1. Implement a custom version of the graphql-js printer that ignores unnecessary white-space and tokens
  2. Use the stripIgnoredCharacters utility at runtime. This pulls the lexer into client bundles, and will add about 16kb to the footprint of apollo-client

I recently helped a team implement solution 2, along with the work from apollographql/apollo-client#6863. With these changes, we were able to drop a large query URL from > 6k chars to ~1400.

mutoo commented 4 years ago

I was looking for update with this PR https://github.com/apollographql/apollo-link/pull/1241 bud found the apollo-link is deprecated.

Jgfrausing commented 3 years ago

I have made a fork that uses option 2 that always strips the query string. The fork is available at https://github.com/Jgfrausing/apollo-client and its build can be installed using npm install git+https://github.com/jgfrausing/apollo-client-package.git.

The changes are the addition of

// src/link/utils/print.ts
import {ASTNode, print as gqlPrint, stripIgnoredCharacters} from 'graphql';

const print = (node: ASTNode): string => {
  return stripIgnoredCharacters(gqlPrint(node));
}

export default print;

and replacing all imports of print from graphql to use the above version of print instead.

mutoo commented 3 years ago

Our project is using Webpack and node v12+ (module entry support). Temporarily, we use the NormalModuleReplacementPlugin to patch this lib:

The webpack config:

  resolve: {
    mainFields: ['module', 'main'], // make sure module entry at first
  },
  plugins: [
    new webpack.NormalModuleReplacementPlugin(
      /\/node_modules\/@apollo\/client\/link\/http\/selectHttpOptionsAndBody\.js/,
      path.resolve(process.cwd(), './patch/selectHttpOptionsAndBody.js'),
    ),

And the patch file:

import { print, stripIgnoredCharacters } from 'graphql';
// ...
  if (http.includeQuery)
    body.query = stripIgnoredCharacters(print(query));
Jgfrausing commented 3 years ago

@jglovier is this something that can be implemented/prioritized?

jglovier commented 3 years ago

Ahoy @Jgfrausing! šŸ‘‹ Thanks for the ping. I'm not involved with work on Apollo Client, so I can't really speak to anything in this thread (both because I'm not familiar with the team's priorities or how this fits into it, and because this is beyond my technical scope of understanding šŸ˜„). I'll let someone from the core team respond as soon as they are able.

glasser commented 3 years ago

Automatic persisted queries are a reasonable thing to try here; the persisted query link even takes a useGETForHashedQueries option that makes it easy to use GET when the link is shrinking the query to a hash and use POST otherwise (eg when it is automatically filling the cache).

mutoo commented 3 years ago

Automatic persisted queries are a reasonable thing to try here; the persisted query link even takes a useGETForHashedQueries option that makes it easy to use GET when the link is shrinking the query to a hash and use POST otherwise (eg when it is automatically filling the cache).

The first query might be a big query that may exceed some request limitations (happened once with AppSync on AWS). So compression is still preferred.

thekevinbrown commented 3 years ago

Also, as a note, if you don't want to haul the lexer in, a good workaround would be to just do a

.replace(/\s\s+/g, ' ')

As the patch mentioned in option 2 above. There'll still be commas and such, but 95% of the character bulk would be cleaned up by just collapsing whitespace to single characters without adding a bunch to the bundle.

mutoo commented 3 years ago

Also, as a note, if you don't want to haul the lexer in, a good workaround would be to just do a

.replace(/\s\s+/g, ' ')

As the patch mentioned in option 2 above. There'll still be commas and such, but 95% of the character bulk would be cleaned up by just collapsing whitespace to single characters without adding a bunch to the bundle.

Yeah, this trick also works well and can be easy to patch in with createHttpLink.

      const httpLink = createHttpLink({
        /* ... */

        // patch the original fetch
        fetch: (url, options) => {
          const compressedUrl = url.replace(/(%20)+/g, '%20');
          return fetch(compressedUrl, options)
        },
      })

BTW, only one \s would be required, since + means one or more.

Update: have to update \s to %20, since it's urlEncoded.

glasser commented 3 years ago

The first query might be a big query that may exceed some request limitations (happened once with AppSync on AWS). So compression is still preferred.

Sure, but note that useGETForHashedQueries will use POST for the non-APQ query which hopefully will be fine from a size perspective.

@thekevinbrown Note that your suggestion will cause problems if there are any string literals in your operation which may contain multiple characters! While it's generally best to put strings in variables rather than the operation, it would probably be good to not corrupt operations with string literals. Also @mutoo 's suggestion seems to apply to the entire URL which would include the variables key too, which also may well have significant spaces.

mutoo commented 3 years ago

@glasser what if the server isn't using the apollo server but more generic graphql server implements, I don't think the hashed query would work in this case.

glasser commented 3 years ago

If your server doesn't implement a protocol that allows you to send fixed-size queries that is designed to work well with GET, then it will probably be a challenge to send queries that work well with GET, for sure.

runjak commented 3 years ago

Hi, I keep wondering about this issue from time to time, and felt compelled write something.

I get that automatic persisted queries are a way around the size limitations of GET queries. I'm also working with a code base where adding support for APQ will require some work. We're currently using patch-package to get a bit more mileage out of GET and it get's us trough ok for the moment.

Apart from APQ being the alternative here there are some things that I'm wondering about in relation to GET queries:

  1. GET queries appear to be a supported feature in the sense that useGETForQueries exists, and shortening queries is an easy way to get more out of it. It seems that this would be a clear improvement to what can be done with them.
  2. If I understand correctly with APQ the server has to remember the hash sent by the client for the next upcoming request. Wouldn't this open up the server to clients spamming random hashes for queries possibly resulting in the server having to deal with high numbers of hashes that may never be used? In that case the switch to APQ is not just an implementation detail, but something that also implies that runtime memory requirements can easily be different. With GET queries there seems to be a method in place that can already be cached in a CDN without this overhead.
  3. I'm uncertain about the ways/places where a bigger GET query would cause bigger network traffic. I think that HTTP compression could take care of a lot of it, but am uncertain if that indeed happens, or if there is still a higher cost in network traffic associated to the choice of being more verbose on the network than necessary. If compression doesn't deal with all of the verbosity my impression would be that in both, GET and POST cases the verbosity introduces an additional burden on performance and costs.

TL;DR: I see that APQ can be an alternative to GET queries in many cases, but am under the impression that the argument prevents a possible improvement that wouldn't hurt anyone.