improvements to lexer + parser performance

allancalix commented 2 years ago

Description

I noticed that parse times increase with query times more than I'd expect. I suspect that this has to do with the number of allocations happening during lexing/parsing as switching to jemalloc improved things ~15%. I'm not sure if it's something specific to this query or if any query of this size that sees the increase.

10 aliased fields

parser_peek_n           time:   [21.519 µs 21.535 µs 21.562 µs]                           
                        change: [-97.894% -97.891% -97.887%] (p = 0.00 < 0.05)
                        Performance has improved.

100 aliased fields

Gnuplot not found, using plotters backend
parser_peek_n           time:   [197.55 µs 197.71 µs 197.91 µs]                          
                        change: [+816.92% +818.31% +819.65%] (p = 0.00 < 0.05)
                        Performance has regressed.

1000 aliased fields

Gnuplot not found, using plotters backend
parser_peek_n           time:   [2.0601 ms 2.0618 ms 2.0636 ms]                           
                        change: [+940.25% +941.65% +943.08%] (p = 0.00 < 0.05)
                        Performance has regressed.

Steps to reproduce

Parse a large query see example

Expected result

Parse time for large queries should grow more slowly.

Actual result

Every 10x increase in number of selections increases parse times by > 800%.

Environment

Linux / Darwin
apollo-rs apollo-parser revision = HEAD

lrlna commented 2 years ago

Interesting! This seems to be an issue specific to the particular Document you're benchmarking. We use apollo-rs on Documents that are MBs in size (as suppose to 57KB in the example you provided), and we are yet to observe such a large spike in parse times.

I'll need to investigate this a bit more.

allancalix commented 2 years ago

It looks like this issue is common for graphql parsers gqlparser, async-graphql-parser, and graphql-parser. While I wasn't able to make the scaling of these queries better overall, reducing the number of allocations in the lexer does improve the performance overall and seems to make it the fastest parsing option of the bunch.

Maybe this could go further in certain cases with https://github.com/rust-analyzer/rowan/pull/119. I more or less hit a wall with all CPU time dedicated to parsing the selection set fields.

flamegraph

Here are some benchmarks with the available Rust parsers: https://github.com/allancalix/graphql-benchmark.

Baseline small query

async_graphql_parser    time:   [7.5938 µs 7.6117 µs 7.6288 µs]
                        change: [-0.5916% -0.1561% +0.2545%] (p = 0.46 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild

apollo_graphql_parser   time:   [6.7100 µs 6.7274 µs 6.7436 µs]
                        change: [-1.8937% -1.5113% -1.1105%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild

apollo_fork_graphql_parser
                        time:   [4.7675 µs 4.7785 µs 4.7904 µs]
                        change: [-2.1926% -1.7055% -1.2349%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

graphql_parser          time:   [5.6228 µs 5.6353 µs 5.6472 µs]
                        change: [-0.3519% +0.0893% +0.6192%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Pathological query

async_graphql_parser #2 time:   [2.3517 ms 2.3572 ms 2.3625 ms]
                        change: [+1.3489% +1.6262% +1.9296%] (p = 0.00 < 0.05)
                        Performance has regressed.

apollo_graphql_parser #2
                        time:   [3.0019 ms 3.0110 ms 3.0203 ms]
                        change: [+1.9770% +2.3294% +2.6782%] (p = 0.00 < 0.05)
                        Performance has regressed.

apollo_fork_graphql_parser #2
                        time:   [1.9258 ms 1.9290 ms 1.9322 ms]
                        change: [+0.6851% +0.9991% +1.3283%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

graphql_parser #2       time:   [2.2415 ms 2.2494 ms 2.2584 ms]
                        change: [+0.1431% +0.5157% +0.9513%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

lrlna commented 2 years ago

whoaaa thank you so much for putting all this work in for benchmarking @allancalix! i really appreciate it!

There are definitely a whole bunch of improvements we can make when it comes to parser's performance. The work you did in #294 is one way to go about it, there is also an idea have a streaming interface (like in #115) for the lexer, or simply have the parser consume lexer's tokens without having that extra alloc at all.

I realise I won't have enough time to think of a direction for us to take this until after GraphQL Summit happening in the first week of October. This goes in my backlog for October though.

allancalix commented 2 years ago

I've actually done both of these things to solve some bottlenecks for my use case. It's fairly rough currently, but once I get a chance to clean it up I'd be happy to upstream the changes if you'd be willing to review the PR. I included the excellent work done in https://github.com/apollographql/apollo-rs/pull/115 in additional to removing all the allocations from the Token types themselves.

Can see the direction of the changes here: https://github.com/apollographql/apollo-rs/compare/main...allancalix:apollo-rs:finite-state-lexer. I've observed ~40-50% decrease in parse times for both simple and complex queries using the new lexer.

apollographql / apollo-rs