Gabriella439 / pipes

Compositional pipelines
BSD 3-Clause "New" or "Revised" License
488 stars 72 forks source link

Delay inlining to allow rules to fire #163

Closed bgamari closed 8 years ago

bgamari commented 8 years ago

The rewrite rules defined in Pipes are currently quite fragile as the functions which they match on may be inlined before the rules are given an opportunity to fire. Here we delay inlining of various operations to ensure that the rules can fire unimpeded during simplifier phase 2.

These cases are pointed out by a warning which will be introduced in GHC 8.0.1.

Gabriella439 commented 8 years ago

Is this ready to merge or still in progress?

bgamari commented 8 years ago

Gabriel Gonzalez notifications@github.com writes:

Is this ready to merge or still in progress?

Still in progress; there is one apparent performance regression that I need to look into.

bgamari commented 8 years ago

For the record this will (hopefully) fix #124.

bgamari commented 8 years ago

The performance story here is rather mixed,

All-in-all, the arithmetic mean of the deltas comes out to a roughly 1% improvement. However, this patch-set wasn't necessarily intended to improve performance; rather it was to make it clear to GHC what inlinings we expect it to performance (as it now warns when rules are fragile).

Pipes/concat is indeed worrying; whether the rest of the gains are enough to offset it are up to you. It can no doubt be fixed, but my time for this is rather thin at the moment.

Another interesting axis to this is whether we inline _bind. Doing so allows GHC to specialise away the Monad dictionary when it is known; this manifests as some double-digit-percentage performance improvements on Folds/{all, any, find}, Pipes/{find, mapM}. Oddly enough, however, there are larger regressions in the *_A benchmarks in LiftBench. For this reason I currently don't inline _bind.

bgamari commented 8 years ago

So I had a bit more time to look at the Pipes/concat issue this morning. It appears that forcing inlining (in phase 0, to avoid short-cutting other rewrites) of the composition operators (namely (//>)) is sufficient to turn the 120% runtime increase in this benchmark into a 40% runtime reduction.

With this change, the final runtime deltas are,

    Folds/all                              -62.0%
    Folds/any                              -60.0%
    Folds/find                             -56.8%
    Folds/findIndex                        26.3%
    Folds/fold                             5.1%
    Folds/foldM                            5.7%
    Folds/head                             0.4%
    Folds/index                            5.6%
    Folds/last                             -5.7%
    Folds/length                           3.4%
    Folds/null                             4.2%
    Folds/toList                           5.2%
    Pipes/chain                            -58.9%
    Pipes/concat                           -43.1%
    Pipes/drop                             18.3%
    Pipes/dropWhile                        12.0%
    Pipes/filter                           -48.1%
    Pipes/findIndices                      -22.8%
    Pipes/map                              -23.6%
    Pipes/mapM                             -56.8%
    Pipes/scan                             -30.0%
    Pipes/scanM                            -43.4%
    Pipes/take                             -23.0%
    Pipes/takeWhile                        -29.2%
    ReaderT/runReaderP_A                   15.9%
    ReaderT/runReaderP_B                   8.1%
    StateT/evalStateP_A                    24.1%
    StateT/evalStateP_B                    15.2%
    StateT/execStateP_A                    21.1%
    StateT/execStateP_B                    19.7%
    StateT/runStateP_A                     13.9%
    StateT/runStateP_B                     11.4%
    Zips/zip                               -52.1%
    Zips/zipWith                           -54.3%
    enumFromTo.vs.each/each                -46.6%
    enumFromTo.vs.each/enumFromTo          -52.1%

The findIndex and drop regressions appear to be quite sensitive to context (e.g. I'm unable to reproduce them in minimized benchmarks) so I'm going to call this finished. The StateT and ReaderT regressions are troubling, but they appear to be largely dependent upon whether _bind is inlined, which regresses runtime in some of the other benchmarks for reasons I'm not entirely sure I understand.

bgamari commented 8 years ago

@Gabriel439 ping.

Gabriella439 commented 8 years ago

Sorry for the delay. Could you go back to inlining _bind? I don't care very much about the liftBench benchmarks (I should probably get rid of them) and I'd love to be able to specialize away the Monad dictionary

bgamari commented 8 years ago

I've pushed another patch restoring inlining of _bind. Indeed I can't actually seem to reproduce the regression that I was seeing earlier, so it appears that this actually has no cost even.

Here's a systematic comparison

Benchmark0-initial1-inline2-no-inline-bind3-more-inlining4-inline-bind
Folds/all1.34e-43.4%2.8%-54.5%-59.4%
Folds/any1.33e-45.4%6.0%-60.6%-60.5%
Folds/find1.35e-41.9%-5.8e-2%-59.1%-61.1%
Folds/findIndex1.22e-45.4%2.8%6.1%2.1%
Folds/fold5.14e-51.9%0.4%0.4%0.9%
Folds/foldM5.16e-5-2.3%-2.2%-3.5%-1.3%
Folds/head8.48e-94.7%2.6%0.3%-0.9%
Folds/index1.10e-46.4%4.4%1.1e-2%-1.4%
Folds/last8.50e-51.8%1.6%-2.8%-1.8%
Folds/length4.30e-51.6%1.9%-0.9%4.0%
Folds/null7.97e-9-1.3%6.6%3.3%1.8%
Folds/toList1.11e-4-6.0%-6.4%-9.1%-8.1%
Pipes/chain9.85e-4-12.3%-7.6%-58.5%-57.8%
Pipes/concat1.28e-4116.6%116.0%-42.5%-42.0%
Pipes/drop8.29e-51.6%0.8%1.5%10.6%
Pipes/dropWhile1.17e-40.9%1.7%-3.6%0.5%
Pipes/filter4.77e-4-44.8%-45.1%-53.9%-50.9%
Pipes/findIndices3.01e-4-0.4%-0.3%-27.2%-19.8%
Pipes/map2.55e-4-1.8%-1.4%-37.1%-31.2%
Pipes/mapM1.05e-3-19.2%-18.5%-59.8%-63.3%
Pipes/scan3.05e-4-3.5%-4.3%-35.6%-34.8%
Pipes/scanM8.57e-4-36.7%-31.3%-48.1%-45.8%
Pipes/take2.70e-4-5.8%-4.3%-35.9%-36.4%
Pipes/takeWhile2.81e-40.2%-1.7%-27.5%-22.3%
ReaderT/runReaderP_A2.24e-41.5%-0.7%-3.8%-2.0%
ReaderT/runReaderP_B3.94e-3-0.8%-0.8%7.7e-2%1.0%
StateT/evalStateP_A2.96e-4-9.3%-9.4%-10.8%-9.3%
StateT/evalStateP_B4.15e-3-0.2%-1.3%3.9%-0.7%
StateT/execStateP_A2.65e-42.1%-0.3%6.1%-0.1%
StateT/execStateP_B4.08e-32.2%1.8%3.6%2.6%
StateT/runStateP_A2.94e-4-8.3%-7.7%-4.5%-10.5%
StateT/runStateP_B3.91e-3-1.1%-0.6%6.9%2.5%
Zips/zip1.03e-3-43.4%-44.5%-57.5%-56.7%
Zips/zipWith1.00e-3-40.5%-43.2%-55.1%-51.0%
enumFromTo.vs.each/each1.54e-42.7%-1.9%-53.9%-47.2%
enumFromTo.vs.each/enumFromTo1.62e-410.1%-1.1%-54.7%-52.7%
Gabriella439 commented 8 years ago

Perfect! Thanks for taking the time to fix all the rewrite rules. I know how much effort that must have taken so I wanted to let you know that I'm deeply thankful for your work on this