An extra bottom-up recursive extractor

TrevorHansen commented 10 months ago

Hi,

Before I saw #8 I made a a recursive extractor, too.

This code is much ugglier than #8, but is faster on some problems (for example acyclic graphs). On acyclic graphs it's able to extract in one pass through, without building the dependencies.

Taking the cumulative times for all three extractors on my machine:

Loaded 454 jsons.
...
Cumulative time for bottom-up: 3386ms
Cumulative time for bottom-up-analysis: 2483ms
Cumulative time for bottom-up-recursive: 977ms

bottom-up is the one currently in, bottom-up-analysis is #8, bottom-up-recursive is this PR.

Is there benefit in having all three, or should the two new ones be combined?

mwillsey commented 9 months ago

Good question! @Bastacyclop what do you think? Offhand, I'm leaning toward fewer, more distinct algorithms, so unless we can characterize that these have different advantages, I'd prefer to have just one.

Bastacyclop commented 9 months ago

Running the benchmarks with my plot.py version gives:

Loaded 675 jsons.
---- output/babble -- bottom-up results:
tree        mean: 383.4740
dag         mean: 202.1272
micros      mean: 4099.8324
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 145.00, 1710.50, 3735.00, 5617.50, 16994.00
---- output/babble -- bottom-up-analysis results:
tree        mean: 383.4740
dag         mean: 202.1618
micros      mean: 9924.3410
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 293.00, 3834.50, 9014.00, 13213.50, 34602.00
---- output/babble -- bottom-up-recursive results:
tree        mean: 383.4740
dag         mean: 202.1272
micros      mean: 6283.2775
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 173.00, 2303.50, 5119.00, 8275.50, 25213.00

---- output/egg -- bottom-up results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 7137.1071
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 12.00, 20.50, 57.00, 463.25, 148309.00
---- output/egg -- bottom-up-analysis results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 16172.1786
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 19.00, 50.25, 88.00, 861.75, 317778.00
---- output/egg -- bottom-up-recursive results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 9666.6786
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 13.00, 29.00, 78.50, 693.75, 183555.00

---- output/flexc -- bottom-up results:
tree        mean: 322.0000
dag         mean: 85.0000
micros      mean: 50959.7857
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 30125.00, 33549.75, 54544.50, 58975.50, 102810.00
---- output/flexc -- bottom-up-analysis results:
tree        mean: 322.0000
dag         mean: 84.4286
micros      mean: 43439.8571
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 36.00, 91.50, 136.00, 137.00
micros quantiles: 28394.00, 31866.75, 44750.00, 49550.25, 79446.00
---- output/flexc -- bottom-up-recursive results:
tree        mean: 322.0000
dag         mean: 85.0000
micros      mean: 36553.7857
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 26775.00, 30454.50, 35102.00, 38385.25, 73826.00

---- output/tensat -- bottom-up results:
tree        mean: 1858437671761.2795
dag         mean: 5.9862
micros      mean: 737891.6000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.35, 18.81
micros quantiles: 869.00, 22696.25, 230507.50, 1727811.50, 2385394.00
---- output/tensat -- bottom-up-analysis results:
tree        mean: 1858437671761.2795
dag         mean: 5.9793
micros      mean: 584151.0000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.34, 18.80
micros quantiles: 495.00, 18215.00, 143353.50, 1247708.75, 2311130.00
---- output/tensat -- bottom-up-recursive results:
tree        mean: 1858437671761.2795
dag         mean: 5.9916
micros      mean: 69755.5000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.36, 18.84
micros quantiles: 168.00, 10949.75, 54289.00, 69088.25, 349218.00

tree costs are basically the same, as expected.
bottom up recursive is faster than bottom up analysis for all datasets, in particular for tensat: is this only from skipping building dependencies or is there more going on?

I feel like we should be able to combine the three bottom up versions, with some more experiments and code cleanup. Maybe by (1) doing a first pass without building dependencies; and (2) if dependencies are used, doing a bottom up analysis based on unique queues. The question would be whether using unique queues for the second stage brings performance benefits or not (I think it should: https://github.com/egraphs-good/egg/issues/239). To properly evaluate that I would like to see datasests with more costly to compute, child-dependent cost functions.

Bastacyclop commented 9 months ago

PS: one thing to consider is that in egg the dependencies don't need to be computed as they are already stored in the e-graph.

mwillsey commented 8 months ago

I would like to consider computation of dependencies (parents) as somewhat negligible, as its only linear and as @Bastacyclop says it's already there in many contexts.

I'd like to preserve the "dumb" bottom up extractor as a base case. Ideally, we could consolidate the "smarter" bottom-up extractors into one (i.e., those that do not aim to do cost sharing, as a possible definition). Thoughts on that?

oflatt commented 7 months ago

Before I read this closely I also worked on a bottom-up extractor (#20) Sorry for duplicated work! I would also be happy with consolidating this, #20, and #8 if possible.

TrevorHansen commented 7 months ago

With the recent changes to the bottom-up extractor(#20), there's now only a small time advantage for the extractor that I proposed introducing (Note these times differ from before because extra problems have been added). Currently: Cumulative time for faster-bottom-up: 2060ms [The one in #20] Cumulative time for bottom-up-recursive: 1533ms [The one in this PR]. Cumulative time for bottom-up: 4471ms

Meaning that the extractor in this PR is only about 25% faster than the others, but is much uglier.

However, there are some tweaks to the "faster-bottom-up" extractor which brings down its runtime to almost the same as the extractor in this PR:

###################################################
faster-bottom-up vs faster-bottom-up-old

extractors: ['faster-bottom-up', 'faster-bottom-up-old']
cumulative time for faster-bottom-up: 1649ms
cumulative time for faster-bottom-up-old: 2060ms
cumulative tree cost for faster-bottom-up: 18584377265237
cumulative tree cost for faster-bottom-up-old: 18584377265237
cumulative dag cost for faster-bottom-up: 78037
cumulative dag cost for faster-bottom-up-old: 78037
Cumulative time for faster-bottom-up: 1649ms
Cumulative time for faster-bottom-up-old: 2060ms
faster-bottom-up / faster-bottom-up-old
geo mean
tree: 1.0000
dag: 1.0000
micros: 0.8184
quantiles
tree:   1.0000, 1.0000, 1.0000, 1.0000, 1.0000
dag:    1.0000, 1.0000, 1.0000, 1.0000, 1.0000
micros: 0.3611, 0.7825, 0.8227, 0.8611, 1.8333

So I've changed this PR to now just introduce some small speedups to the faster-bottom-up extractor, as well as fixing up attribution to @Bastacyclop.

Bastacyclop commented 7 months ago

:+1:

egraphs-good / extraction-gym

An extra bottom-up recursive extractor #9