faster-ILP extractor - Githubissues

TrevorHansen commented 11 months ago

This improves the ILP extractor. It introduces more simplification on the egraph before solving and changes the ILP extractor to only block cycles that are found in the solver's result.

This code produces better answers than the version it replaces - both version cheat though. The previous version excluded about 1/4 of nodes, mostly for no good reason. This version has a tight-timeout (currently 6 seconds) on ILP solving and returns the bottom-up result when that limit is exceeded. On the test set of 220 problems, about 35 timeout.

It's probably better to block the cycles up front like @AD1024 does in the maxsat extractor.

Most of the new code added is to remove nodes that won't be selected from the Egraph. Later on I can pull out that code so that it can be run before other extractors.

The results, with a 6 second cumulative solving timeout, are:

cumulative time for bottom-up: 3441ms
cumulative time for ilp-cbc: 316560ms
bottom-up / ilp-cbc
geo mean
tree: 0.9706
dag: 1.0023
micros: 0.0036
quantiles
tree:   0.4286, 1.0000, 1.0000, 1.0000, 1.0000
dag:    1.0000, 1.0000, 1.0000, 1.0000, 1.2000
micros: 0.0001, 0.0021, 0.0038, 0.0064, 0.1222

Notable are:

It takes about 100x as long as bottom-up.
The DAG results are never worse than bottom-up
On average the results are 0.2% better.

When I instead use a 7200 second timeout, we get 10 timeouts and:

cumulative time for bottom-up: 3479ms
cumulative time for ilp-cbc: 93800888ms
bottom-up / ilp-cbc
geo mean
tree: 0.9620
dag: 1.0070
micros: 0.0016
quantiles
tree:   0.3849, 1.0000, 1.0000, 1.0000, 1.0000
dag:    1.0000, 1.0000, 1.0000, 1.0000, 1.6787
micros: 0.0000, 0.0012, 0.0029, 0.0048, 0.1491

So as @mwillsey has said before, our test problems have small dag vs. tree differences.

The previous results for this extractor were quite bad:

cumulative time for bottom-up: 3194ms
cumulative time for ilp_cbc: 283199ms
bottom-up / ilp_cbc
geo mean
tree: 0.5734
dag: 0.6673
micros: 0.0208
quantiles
tree:   0.0604, 0.4878, 0.5394, 0.6740, 1.0000
dag:    0.0976, 0.5751, 0.6366, 0.7717, 1.2106
micros: 0.0001, 0.0142, 0.0238, 0.0348, 0.1298

mwillsey commented 11 months ago

So our test set only has a few dag vs tree differences, but there are some! Unless I'm reading this wrong, it looks like this patch never beats bottom-up on tree cost? That would be bad, since the whole point is to do that, right?

TrevorHansen commented 11 months ago

So our test set only has a few dag vs tree differences, but there are some! Unless I'm reading this wrong, it looks like this patch never beats bottom-up on tree cost? That would be bad, since the whole point is to do that, right?

Nice questions. I'll describe how I understand things at the moment, could easily be mistaken. My understanding is that the DAG cost considers sharing and the tree cost doesn't. So the DAG cost includes each selected node's cost once. The tree cost includes each nodes cost once per time it's used as a child of another node. So given the graph (A+A), the DAG cost includes the cost of the A node once, while the tree cost includes its cost twice.

It's really fast to calculate the optimal tree cost, because you can look locally and just select the lowest code node in each class. Bottom-up does this. Getting the optimal DAG cost is much harder and is something like NP-hard.

If this is right, a few things follow. The ILP approach should never do better than bottom-up on tree cost - because bottom-up is optimal, and the optimal DAG cost should always be less than or equal to the optimal tree cost. Both hold in the new results.

-----Edit On reflection, I think I've gotten this wrong. How hard the extraction is depends on the structure of the (multi)graph.

If it's a tree, both the tree extraction and DAG extraction are easy
If it's a DAG, the tree extraction is easy and the DAG extraction is hard.
If it's a graph with cycles, both the tree extraction and DAG extraction are hard.

I'll create some examples that highlight the differences.

mwillsey commented 10 months ago

:facepalm: I was reading the division number in the code blocks backwards. Your code does indeed seem like an improvement then.

So now the question becomes what is the utility of the various configurations? Is pre-loading the search with the bottom up results not always a good idea?

TrevorHansen commented 9 months ago

So now the question becomes what is the utility of the various configurations? Is pre-loading the search with the bottom up results not always a good idea?

When I tried pre-loading the search it took longer. It's counter intuitive.

Just now when I tried some different configuration options it segfaulted. I think it's failing some times on the newly added problems, so I need to test this PR more carefully.

mwillsey commented 8 months ago

@Bastacyclop mentioned that #7 should be subsumed by this. @TrevorHansen do you have an idea on where this sits in the current landscape of extractors? I'd like the get some of these PRs closed/merged in the coming week.

TrevorHansen commented 8 months ago

@Bastacyclop mentioned that #7 should be subsumed by this. @TrevorHansen do you have an idea on where this sits in the current landscape of extractors? I'd like the get some of these PRs closed/merged in the coming week.

I've got a little bit of cleanup, then this will be ready for you to consider merging. I'll generate some new performance data when I do that so that its performance can be compared to the other extractors.

TrevorHansen commented 8 months ago

@Bastacyclop mentioned that #7 should be subsumed by this. @TrevorHansen do you have an idea on where this sits in the current landscape of extractors? I'd like the get some of these PRs closed/merged in the coming week.

I've had the time to clean up this extractor, and it's now ready to review.

Conceptually there are two things that it's helpful to keep in mind to understand what it does:

If it can be done without exponential increases, it tries to collapse down the egraph so each non-root class has >1 parent class.
It doesn't block cycles before calling the ILP solver, instead it blocks cycles that are returned by the ILP solver.

Here is the comparison to the current ILP extractor, both run with a 10 second solver time-out:

faster-ilp-cbc-timeout vs ilp-cbc-timeout

extractors: ['faster-ilp-cbc-timeout', 'ilp-cbc-timeout']
data/babble/list_list_hard_test_ellisk_2019-02-15T11.26.41--bench010_it15.json  differs in dag cost:  425 426
data/babble/physics_scientific_unsolved_4h_ellisk_2019-07-20T18.05.46--bench003_it3.json  differs in dag cost:  171 172
data/babble/list_list_hard_test_ellisk_2019-02-15T11.26.41--bench009_it12.json  differs in dag cost:  407 408
data/babble/physics_scientific_unsolved_4h_ellisk_2019-07-20T18.13.12--bench001_it1.json  differs in dag cost:  144 145
data/babble/physics_scientific_unsolved_4h_ellisk_2019-07-20T18.20.13--bench001_it1.json  differs in dag cost:  155 156
data/babble/list_list_hard_test_ellisk_2019-02-15T11.35.48--bench005_it5.json  differs in dag cost:  243 244
data/babble/towers_tower_batch_50_3600_ellisk_2019-03-26T10.51.16--bench002_it3.json  differs in dag cost:  231 232
data/tensat/bert.json  differs in dag cost:  0.7960839965562627 0.82470399630256
data/tensat/bert_acyclic.json  differs in dag cost:  0.8210050156958459 0.8270000170450658
data/eggcc-bril/block-diamond.bril.json  differs in dag cost:  70 72
data/eggcc-bril/simple_recursive.bril.json  differs in dag cost:  1051 1073
data/rover/box_filter_3iteration_egraph.json  differs in dag cost:  1701 1819
data/rover/mcm_3_7_21_original_8iteration_egraph.json  differs in dag cost:  1050 1165
data/rover/mcm_3_7_21_original_9iteration_egraph.json  differs in dag cost:  1050 1165
cumulative tree cost for faster-ilp-cbc-timeout: 32017750434754
cumulative tree cost for ilp-cbc-timeout: 32017750427110
cumulative dag cost for faster-ilp-cbc-timeout: 78666
cumulative dag cost for ilp-cbc-timeout: 79045
Cumulative time for faster-ilp-cbc-timeout: 445713ms
Cumulative time for ilp-cbc-timeout: 953777ms
faster-ilp-cbc-timeout / ilp-cbc-timeout
geo mean
tree: 1.0094
dag: 0.9987
micros: 0.2747
quantiles
tree:   0.8415, 1.0000, 1.0000, 1.0000, 2.3333
dag:    0.9013, 1.0000, 1.0000, 1.0000, 1.0000
micros: 0.0004, 0.1635, 0.3450, 0.6229, 1.5938

Here are some graphs showing the comparison:

dag_cost_improvement

I'm happy to change as required.

TrevorHansen commented 5 months ago

I'd like to get this merged in then get to work on the next parts. I appreciate this is a big path. Is there anything I can do to make it easier to review? Cheers.

mwillsey commented 5 months ago

Yes, let's merge this. Can you enhance the documentation in the file? The comment above is a good start, but I have a couple more questions.

The comment at the top of the file mentions incrementality, but then another comment mentions that you don't actually incrementally call the solver?
How do you "break" each cycle found by the solver?
Why did you decide to do cycle breaking rather than the topological sort approach... performance?
Can you add a little documentation the "removal" part of simplification? I think I get the collapse down part.
The e-graph simplification could be a separate utility, it seems that it could benefit all extraction methods. This isn't a change request, just a comment.

TrevorHansen commented 5 months ago

Thanks for the comments @mwillsey. I've described a bit better what's happening with the code. Let me know where it could be clearer and I'll improve.

Yes, I agree that the e-graph simplification should be a separate utility.

mwillsey commented 4 months ago

Sorry for the delay on this!

egraphs-good / extraction-gym

faster-ILP extractor #16