Greedy bottom up alternative based on egraph analysis algorithm

Bastacyclop commented 10 months ago

Hi All,

This PR provides an alternative greedy bottom up extraction implementation -- BottomUpAnalysisExtractor -- based on the egraph analysis algorithm that uses a pending queue. This is basically the algorithm that I use in my https://github.com/Bastacyclop/egg-sketches library.

Based on https://github.com/egraphs-good/extraction-gym/pull/7

This is the (filtered) output of make:

Loaded 675 jsons.
---- output/babble -- bottom-up results:
dag         mean: 202.1272
micros      mean: 2821.5780
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 91.00, 1218.50, 2385.00, 3782.50, 14182.00
---- output/babble -- bottom-up-analysis results:
dag         mean: 202.1561
micros      mean: 6412.6012
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 177.00, 2555.50, 5514.00, 8845.50, 28623.00
---- output/egg -- bottom-up results:
dag         mean: 3.2857
micros      mean: 6676.3929
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 12.00, 26.25, 54.00, 507.75, 146008.00
---- output/egg -- bottom-up-analysis results:
dag         mean: 3.2857
micros      mean: 12616.2500
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 17.00, 33.00, 77.00, 681.25, 263692.00
---- output/flexc -- bottom-up results:
dag         mean: 85.0000
micros      mean: 63664.1429
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 27256.00, 28729.00, 59139.00, 99794.25, 109424.00
---- output/flexc -- bottom-up-analysis results:
dag         mean: 84.6429
micros      mean: 35691.5000
dag    quantiles:    35.00, 36.00, 91.50, 137.00, 137.00
micros quantiles: 20051.00, 21609.50, 40668.50, 44200.25, 68863.00
---- output/tensat -- bottom-up results:
dag         mean: 5.9862
micros      mean: 797192.6000
dag    quantiles:    0.82, 0.94, 4.41, 8.35, 18.81
micros quantiles: 848.00, 22166.75, 189955.50, 1953350.75, 2280621.00
---- output/tensat -- bottom-up-analysis results:
dag         mean: 5.9760
micros      mean: 644572.2000
dag    quantiles:    0.82, 0.94, 4.41, 8.34, 18.76
micros quantiles: 551.00, 13178.25, 124706.00, 1232582.25, 2452544.00

Overall we can see that the more complex bottom-up-analysis is slower on the easiest datasets 'babble' and 'egg', but also faster on the harder datasets 'flexc' and 'tensat'. I expect bottom-up-analysis to perform even better on datasets with more costly to compute, child-dependent cost functions, although extraction-gym does not support such datasets yet.

The dag costs also vary slightly due to different nodes being chosen -- I assume that the tree costs are the same but could double check.

oflatt commented 7 months ago

How does this compare to #20? Are they the same idea?

Bastacyclop commented 7 months ago

Hi @oflatt, they are the same idea and the code is almost the same as well.

egraphs-good / extraction-gym

Greedy bottom up alternative based on egraph analysis algorithm #8