Rewriter generates too many potential matches

brianhuffman commented 2 years ago

While working on some saw proofs that prove a lot of small lemmas using rewriting, I noticed that many of the calls to the simplifier were taking more time than I thought they should. Profiling showed that the apply function (defined locally inside rewriteSharedTerm) was being called hundreds of times more often than the surrounding rewriteTop function, suggesting that TermNet.unify_term is returning a lot more potential matches than it should, and all these bogus potential matches need to be filtered out by calling the much more expensive function scMatch.

By adding some debug output to rewriteSharedTerm, I discovered that when rewriting with cryptol_ss, terms like sort 0, Prelude.Nat, or Prelude.Vec 32 Prelude.Bool yield a list of 161 potential rewrite rules. This is the smallest set of potential rewrites that arises from any term when simplified with cryptol_ss. The potential rules all seem to be ones that have saw-core record projections appearing on the left-hand side.

For other terms, including function types, product types, or natural numbers, the rewriter finds 430 potential rewrite rules. (I think this is actually the complete set of all the rules in cryptol_ss.)

Basically this means that the term net data structure is doing nothing for us. I believe the issue is in the toPat function that turns a saw-core Term into a term-net Pat: It looks like it's treating way too many term constructors as wildcards that could potentially match anything.

brianhuffman commented 2 years ago

The root of the problem is that this definition of termToPat has a catch-all case that maps lots of terms to the wildcard pattern Var. As new constructors have been added to the Term datatype, this function definition was not kept up to date.

https://github.com/GaloisInc/saw-script/blob/81287fb74cbdac3bb3f75b984f1125deb3055e64/saw-core/src/Verifier/SAW/Term/Functor.hs#L409-L420

I will rewrite this function so that it does not use the underscore catch-all case, instead using a complete pattern match. This way we will get a compiler warning if we forget to update termToPat after changing the Term datatype again.

brianhuffman commented 2 years ago

The other reason why we're returning way too many rewrite rules is that the rewriter queries the term net in "unify" mode instead of "match" mode: https://github.com/GaloisInc/saw-script/blob/81287fb74cbdac3bb3f75b984f1125deb3055e64/saw-core/src/Verifier/SAW/Rewriter.hs#L649

This change was introduced in 916040bbec1d46e3aad2901e179d76db85d63264 in order to allow rule matching modulo Nat equations. To get reasonable performance with large simpsets, we'll have to figure out how to do this another way.

GaloisInc / saw-script

Rewriter generates too many potential matches #1577