dart-lang / dart_style

An opinionated formatter/linter for Dart code
https://pub.dev/packages/dart_style
BSD 3-Clause "New" or "Revised" License
645 stars 117 forks source link

Simplify subtree merging. #1500

Closed munificent closed 1 month ago

munificent commented 1 month ago

A key optimization in the new formatting is formatting entire branches of the piece tree separately and then merging in the results. This lets us reuse the results of solving and formatting that subtree across multiple solutions.

When merging the results back, it's important that the resulting solution has the same cost and other metrics as if the subtree had not been formatted separately. Otherwise, the optimization can change the formatting behavior.

When I first implemented subtree merging, I ran into some issues where it seemed like the resulting cost wasn't right. I wasn't able to figure out exactly why. To fix it, I made sure we didn't double count the cost of any given bound piece by merging each piece state in the subtree back into the parent solution and only counting its cost if the parent didn't already have that piece bound.

After a lot of debugging, I figured out why bound pieces would sometimes get double counted otherwise. The problem goes like this:

  1. We create a solution. When creating it, we format it which traverses the piece tree and ends up separately formatting some subtrees and merging those into this solution.

  2. Later, we expand this solution into a series of child solutions. When we do, we copy the parent solution's bound pieces and its cost.

  3. When we go to format each of these expanded child solutions, we again traverse the piece tree which ends up separately formatting many of the same subtrees. For each one, we add in the cost of the subtree solution. But we've already added in that cost from the parent solution, which the expanded child inherited, so now we've double counted.

This PR fixes that: For each solution, we separately track the cost of the pieces directly bound by the solution and the cost it accumulated from merging subtrees. When expanding a solution, we inherit the former cost, but not the latter. That way, when we go to format the expanded solution and merge the subtrees in again, we only count those subtree costs once, when the child merges them.

This is a small but noticeable speed improvement:

Benchmark (tall)                fastest   median  slowest  average  baseline
-----------------------------  --------  -------  -------  -------  --------
block                             0.063    0.068    0.120    0.071    109.4%
chain                             0.627    0.641    0.674    0.643    103.1%
collection                        0.161    0.173    0.197    0.173    108.9%
collection_large                  0.856    0.908    0.975    0.908    107.7%
conditional                       0.064    0.066    0.076    0.067    103.6%
curry                             0.585    0.596    0.652    0.600    101.7%
flutter_popup_menu_test           0.271    0.280    0.294    0.279    108.2%
flutter_scrollbar_test            0.128    0.129    0.146    0.132    124.3%
function_call                     1.307    1.370    1.486    1.369    110.4%
infix_large                       0.659    0.696    0.727    0.694    106.3%
infix_small                       0.168    0.179    0.204    0.179    103.6%
interpolation                     0.091    0.095    0.115    0.097     99.5%
large                             3.431    3.598    3.761    3.610    102.4%
top_level                         0.140    0.142    0.169    0.144    104.0%

And when formatting flutter/packages/flutter:

Current formatter     8.624 ========================================
Optimized             8.233 ======================================
Old formatter         4.455 ====================
The current formatter is 48.34% slower than the old formatter.
The optimized is 4.75% faster than the current formatter.
The optimized is 45.89% slower than the old formatter.
The optimization gets the formatter 9.38% of the way to the old one.

But, maybe more importantly, my hope is that this is a step towards untangling cost calculation from formatting, so that I can move towards calculating cost eagerly and then only formatting a solution once we know it's the winner.