cdepillabout / pretty-simple

pretty-printer for Haskell data types that have a Show instance
https://hackage.haskell.org/package/pretty-simple
BSD 3-Clause "New" or "Revised" License
243 stars 29 forks source link

Exponential time complexity for nested multi-line expressions #30

Closed andrew-lei closed 6 years ago

andrew-lei commented 6 years ago
import Criterion.Main
import Text.Pretty.Simple

data Expr = A
          | B Expr
          | C [Expr]
  deriving Show

nest 0 = id
nest n = nest (n-1) . B

test n = bench (show n) $ nf test' n
  where
    test' = pShowNoColor . flip nest (C [A,A])

main = defaultMain [bgroup "nest" $ map test [1..25]]

Is O(2^n). Last five times (mean) on my machine:

21, 5.255 s 22, 9.401 s 23, 18.95 s 24, 41.77 s 25, 83.27 s

I think the problem occurs in putSurroundExpr which performs two traversals for each layer.

cdepillabout commented 6 years ago

Hi @andrew-lei, thanks for the bug report!

I would definitely accept a PR fixing this, if you're so inclined.

Or even just adding a bench mark demonstrating how much of a problem this is. That way, when someone tries to fix it, they will have a good test to demonstrate whether or not it is actually fixed!

andrew-lei commented 6 years ago

I'll look into it. I think the solution would be either to have the check for multi-line be a function that avoids the full traversal or pass the check up for the upper layers (since if one layer is multi-line, the one containing it will be as well).

andrew-lei commented 6 years ago

Is the presence of a comma necessary and sufficient to determine whether an expression is multi-line?

andrew-lei commented 6 years ago

Using

isMultiLine (Brackets commaSeparated) = isMultiLine' commaSeparated
isMultiLine (Braces commaSeparated) = isMultiLine' commaSeparated
isMultiLine (Parens commaSeparated) = isMultiLine' commaSeparated
isMultiLine _ = False

isMultiLine' (CommaSeparated []) = False
isMultiLine' (CommaSeparated [exprs]) = or $ map isMultiLine exprs
isMultiLine' _ = True

seems to solve the problem. See below for benchmarks (extended back to 25 to make sure). Should I submit a PR?

benchmarking recursive deeply-nested data structure/level 17
time                 377.9 μs   (346.3 μs .. 418.2 μs)
                     0.961 R²   (0.929 R² .. 0.992 R²)
mean                 372.0 μs   (350.6 μs .. 427.4 μs)
std dev              124.2 μs   (42.60 μs .. 225.7 μs)
variance introduced by outliers: 98% (severely inflated)

benchmarking recursive deeply-nested data structure/level 18
time                 351.5 μs   (342.3 μs .. 367.5 μs)
                     0.990 R²   (0.978 R² .. 0.997 R²)
mean                 355.1 μs   (347.2 μs .. 367.4 μs)
std dev              29.56 μs   (20.95 μs .. 46.86 μs)
variance introduced by outliers: 70% (severely inflated)

benchmarking recursive deeply-nested data structure/level 19
time                 417.6 μs   (401.6 μs .. 432.8 μs)
                     0.987 R²   (0.981 R² .. 0.993 R²)
mean                 405.1 μs   (394.1 μs .. 419.4 μs)
std dev              42.70 μs   (36.18 μs .. 53.17 μs)
variance introduced by outliers: 79% (severely inflated)

benchmarking recursive deeply-nested data structure/level 20
time                 534.1 μs   (400.8 μs .. 800.1 μs)
                     0.327 R²   (0.256 R² .. 0.467 R²)
mean                 712.4 μs   (561.7 μs .. 1.019 ms)
std dev              602.2 μs   (428.7 μs .. 798.8 μs)
variance introduced by outliers: 99% (severely inflated)

benchmarking recursive deeply-nested data structure/level 21
time                 360.9 μs   (341.0 μs .. 391.6 μs)
                     0.895 R²   (0.811 R² .. 0.955 R²)
mean                 443.4 μs   (403.5 μs .. 516.1 μs)
std dev              169.0 μs   (106.3 μs .. 275.7 μs)
variance introduced by outliers: 99% (severely inflated)

benchmarking recursive deeply-nested data structure/level 22
time                 388.8 μs   (382.1 μs .. 398.4 μs)
                     0.993 R²   (0.981 R² .. 0.998 R²)
mean                 377.0 μs   (371.6 μs .. 391.7 μs)
std dev              29.35 μs   (18.92 μs .. 48.91 μs)
variance introduced by outliers: 68% (severely inflated)

benchmarking recursive deeply-nested data structure/level 23
time                 412.8 μs   (405.9 μs .. 421.2 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 407.4 μs   (402.2 μs .. 413.3 μs)
std dev              18.27 μs   (14.47 μs .. 24.16 μs)
variance introduced by outliers: 39% (moderately inflated)

benchmarking recursive deeply-nested data structure/level 24
time                 462.3 μs   (450.1 μs .. 477.0 μs)
                     0.994 R²   (0.990 R² .. 0.997 R²)
mean                 447.6 μs   (438.5 μs .. 457.8 μs)
std dev              30.39 μs   (23.26 μs .. 41.68 μs)
variance introduced by outliers: 60% (severely inflated)

benchmarking recursive deeply-nested data structure/level 25
time                 2.058 ms   (1.303 ms .. 2.532 ms)
                     0.697 R²   (0.552 R² .. 0.806 R²)
mean                 1.082 ms   (749.0 μs .. 1.463 ms)
std dev              892.4 μs   (537.8 μs .. 1.049 ms)
variance introduced by outliers: 98% (severely inflated)
cdepillabout commented 6 years ago

@andrew-lei Yeah, if you could make a PR that would be really helpful!

Although I'm somewhat worried about the run-time for level 25 being 2 miliseconds, when level 24 was only 400 micro-seconds. This looks like the same exponential increase as seen above?

Also, if you send a PR, some of the doctests that check the layout for different Haskell expressions may break. In general, that's okay as long as they don't break too much. I think it is more important to solve this problem than to make sure that they layout stays exactly the same.

andrew-lei commented 6 years ago

I think that's probably caused by outliers. Note the high variance.

andrew-lei commented 6 years ago

Ran benchmarks again, this time:

benchmarking recursive deeply-nested data structure/level 25
time                 426.5 μs   (415.2 μs .. 439.3 μs)
                     0.991 R²   (0.982 R² .. 0.997 R²)
mean                 424.5 μs   (416.3 μs .. 436.3 μs)
std dev              35.60 μs   (27.14 μs .. 49.82 μs)
variance introduced by outliers: 70% (severely inflated)

I will shortly commit and send a PR.