Chunky pieces! Aggregate multiple tokens into single CodePieces.

munificent commented 6 months ago

One of the big differences between the old and new formatter is that the new one creates a separate Piece for each token in the program where the old one will concatenate multiple tokens into a single Chunk (hence the name) if there is no way to split between them.

Having a separate Piece for each token makes the piece building process arguably easier to understand. The AstNodeVisitor can simply return a Piece from each visit method.

But there's a potential performance hit to having larger more deeply nested piece trees. At the time that we made that choice, we didn't have enough of the formatter working to measure that cost. Now that the whole language is supported, we can. And it turns out the cost is significant. :(

The new formatter takes about 2x as long to format the Flutter repo as the old formatter. There's a lot of reasons for that, but one of them is that the new formatter spends more time building the Piece tree than the old one does building the Chunk tree, and it then spends more time during formatting traversing that larger tree.

This PR moves back to a push-based model that lets us aggregate adjacent tokens into a single CodePiece when possible. It also adds a bunch of small-scale optimizations to do so in many places (empty collections, empty records, etc.).

This significantly reduces the total number of Pieces created. If I format flutter/packages/flutter/lib, before this change, it creates 3,514,535 pieces. With this change, it's 2,849,785, or about 81% of the pieces. That has a positive impact on performance.

Before this change, formatting the flutter lib directory (excluding IO) takes 4.024s. This change gets it down to 3.58s, around 12% faster.

Here's how it affects the benchmarks:

Benchmark (tall)                fastest   median  slowest  average  baseline
-----------------------------  --------  -------  -------  -------  --------
block                             0.077    0.081    0.144    0.085    116.1%
chain                             2.369    2.396    2.461    2.401    105.2%
collection                        0.191    0.203    0.348    0.208     96.4%
collection_large                  1.064    1.105    1.194    1.111    111.1%
flutter_popup_menu_test           0.533    0.555    0.851    0.565    102.8%
flutter_scrollbar_test            0.224    0.229    0.263    0.232    103.6%
function_call                     1.781    1.809    2.089    1.825    112.7%
infix_large                       1.182    1.198    1.259    1.202    142.2%
infix_small                       0.279    0.289    0.321    0.290    129.6%
interpolation                     0.108    0.115    0.251    0.122    108.6%
large                             5.028    5.085    5.489    5.120    111.7%
top_level                         0.252    0.262    0.341    0.265    105.5%

It's not a huge change across the board, but it's significant. Also, there are still places where we could aggregate tokens into pieces better and get further incremental improvement.

I'm sorry for how giant this change is. I couldn't figure out any way to break it into smaller commits. Fortunately, almost all of the change is mechanical. It's roughly:

buildPiece() -> pieces.build()
b.add()      -> pieces.add()
b.modifier() -> pieces.modifier()
b.token()    -> pieces.token()
b.visit()    -> pieces.visit()

PieceFactory.create___() => PieceFactory.write___()

AdjacentBuilder got merged into PieceWriter which now maintains the stack of in-progress builds. The AstNodeVisitor visit methods and most of the PieceFactory methods write their result instead of returning it. Unlike the old design, there isn't any explicit API to pushing, popping, and splitting. You just write stuff and when you need to capture the result as a Piece, you use pieces.build() (or nodePiece() and tokenPiece() which are mostly just conveniences for that).

munificent commented 6 months ago

Note this PR now also updates to the latest analyzer and fixes uses of some newly deprecated APIs. I'd rather fix that in a separate PR, but this one is failing CI.

Edit: I reverted those changes because they're failing too. 🙃

munificent commented 6 months ago

The CI will be fixed by: https://github.com/dart-lang/dart_style/pull/1456.

dart-lang / dart_style

Chunky pieces! Aggregate multiple tokens into single CodePieces. #1451