LiUGraphQL / LinGBM

Linköping GraphQL Benchmark (LinGBM)
MIT License
31 stars 4 forks source link

Fix PostGraphile queries #131

Open benjie opened 1 year ago

benjie commented 1 year ago

Hi folks, I'm Benjie, the maintainer of PostGraphile, a significant contributor to the GraphQL spec and a member of the GraphQL Technical Steering Committee.

Someone brought your research to my attention and it seems to suffer from a number of flaws which draw its conclusions into question. If you would indulge me, I would like to help you address some of the easier ones of these; please take these suggestions under the constructive spirit in which they are intended.

Lists versus connections

I could not see where PostGraphile was configured, but you should add the --simple-collections both flag to the command line so that you may use simple lists to make your benchmarks a fairer comparison.

By default PostGraphile exposes connections because they're standard practice in the GraphQL space (see the GraphQL pagination docs), but for an apples-to-apples comparison you should either use connections on all tested servers, or lists on all tested servers.

Connections in PostGraphile have a significant overhead when compared to lists, the machinery to build them is much more complicated, so using connections with PostGraphile and lists with other servers unfairly penalises PostGraphile for following best practices out of the box.

Single-row accessors

To find a single row, you typically would use a single row accessor such as facultyByNr. This is more efficient for both the client and the server. Using a list field such as allFacultiesList(condition: {nr: ...}) or a connection field such as allFaculties(condition: {nr: ...}) is much more expensive for the server, and also returns data in the wrong shape for the client - where a list containing at most one item must be unwrapped.

I did not find the database schema in which faculties is defined, but perhaps you neglected to put a unique constraint on that column and thus that accessor was not defined automatically for you? Note that in PostgreSQL unique constraints automatically introduce unique indexes, but unique indexes do not create unique constraints. If you are dictating something about the data then you should use a constraint; an index is merely an optimization for lookup.

Note that if you are unwilling or unable to tweak your database schema to follow these best practices, PostGraphile gives you ways to work around this such as the @unique smart tag which can be used to "pretend" that a unique constraint exists: https://www.graphile.org/postgraphile/smart-tags/#unique

Invalid queries

It seems that you may be skipping validation on your queries, and perhaps you're not validating that the responses from the server are reasonable? A query such as PostGraphile/QT10 is not a valid GraphQL query:

query stringMatching($keyword:String) 
{
  allPublications(filter: {title: {includes: "$keyword"}}) {
    nodes {
      nr
      title
      abstract
    }
  }
}

Note that the $keyword variable is not used in this query at all. Instead, there is a fixed string with the content "$keyword" - the variable will not be interpolated into that string, GraphQL does not perform interpolation.

The result of sending a query such as this one to the server should be to receive an error in response, making the results of any performance benchmarking of a query like this meaningless.

Complex filtering

In a production GraphQL schema, complex filtering is recommended against. Don't take my word for it though, here's Lee Byron, one of the creators of GraphQL:

Strongly agree. One of the original tenets of GraphQL was to only expose what the server can fulfill efficiently. Generic filters imply lots of indexed columns that usually don’t exist. Too easy to write slow queries. -- https://twitter.com/leeb/status/1004655619431731200

In PostGraphile we have to support legacy decisions out of the box to maintain backwards compatibility, but options such as --no-ignore-indexes will ensure that only indexed columns are added to condition and order inputs, encouraging users to ensure these columns are indexed before using them in a query. There's a reason that the connection filter plugin is not included by default and that it's README starts with a warning.

Naming

By default PostGraphile uses long names for fields to avoid naming conflicts, but shorter names will give you both nicer queries and better performance. That's why we include the @graphile-contrib/pg-simplify-inflector plugin in the list of recommended options when running PostGraphile at the top of the usage page.

Disabling the query log

I cannot see if you are using the query log, but you should add --disable-query-log to your PostGraphile options as this is primarily useful for debugging and shouldn't be used in production (note how this flag is also in our recommended options for production).

Leveraging the capabilities of newer versions of Node

Where performance is critical (such as benchmarking!) the GRAPHILE_TURBO=1 environmental variable should be used. To be honest, it should always be used, but enabling it by default would be a breaking change. Without this, we have to compile down to support Node 8.6 which went end-of-life years ago. With this option, we can leverage the fact that async/await is now a core part of JavaScript, which can lead to a 10x improvement in performance in some code paths. PostGraphile V5 will have these optimisations built in from the start because it will no longer support long-unmaintained Node.js versions.

Customising the GraphQL schema PostGraphile generates

PostGraphile is extremely configurable and customizable. We could almost certainly have used a few options and written an inflector plugin in a few lines of code to make PostGraphile match the shape of your desired schema much closer out of the box.


Should you need any advice on improving the test suite further, understanding best practices in GraphQL, or figuring out which options to use to make PostGraphile closer to your template queries, please don't hesitate to reach out.

VladimirAlexiev commented 6 months ago

"Invalid queries" is https://github.com/LiUGraphQL/LinGBM/issues/82