antalsz / hs-to-coq

Convert Haskell source code to Coq source code
https://hs-to-coq.readthedocs.io
MIT License
279 stars 27 forks source link

Optimise pattern incompleteness check #146

Closed trommler closed 4 years ago

trommler commented 4 years ago

Reverse the list of patterns when checking for completeness. In Haskell more general patterns follow specialized patterns and in particular the catch-all clause is always last.

Fixes #112

lastland commented 4 years ago

This looks good to me.

@nomeata Since this is your code, what do you think?

nomeata commented 4 years ago

Smells a bit like premature optimization, but I assume Antal had a good reason, so that’s fine.

Certainly non-obvious to readers why there is a reverse here, so a comment might be good (“more efficient if consider a possible catch-all clause first”)

trommler commented 4 years ago

Smells a bit like premature optimization, but I assume Antal had a good reason, so that’s fine.

Perhaps, I can explain my reasoning a bit better:

A common pattern in pattern (no pun intended!) matches in Haskell is that we have a set of specialized (sometimes nested) patterns first and then a more general pattern that often overlaps with the specialized patterns. Consider the following example (simplified from compiler/cmm/CmmExpr):

data C = C [Int] | D

fun :: C -> _
fun (C []) = _
fun (C [42]) = _
fun (C [n]) = _
fun (C [n1, n2]) = _
fun D = _
fun _  = _

The third pattern contains the second pattern and the last pattern catches all the missing lists of length three and above.

But also when we replace the last line with

fun (C xs) = _

then after checking that pattern only D will be left and that will be found as the next pattern and we are done.

That means the more general patterns will appear towards the end of the list and the less general patterns p1 replace one required pattern by a potentially large list of patterns that are not covered by p1 only to be later removed by a more general pattern p2. If by reversing the list we looked at the more general pattern p2 first and then at p1 we will not find a matching pattern among the missing patterns because p1 was already covered by p2. If p2 is a catch-all pattern we are done immediately.

The only case where this patch will be cause a linear slowdown is when all patterns are exhaustively enumerated. I believe that this is acceptable given that combinatorial blowup is avoided by this patch. I will try @jwiegley's example and report back.

Certainly non-obvious to readers why there is a reverse here, so a comment might be good (“more efficient if consider a possible catch-all clause first”) Agreed, I will add an explanation similar to the above as a comment.

The optimization allowed me to translate function getRegister in the PowerPC code generator in two seconds whereas before it consumed all 16 GB of memory on my machine and did not finish at all.

tadeuszlitak commented 4 years ago

I can confirm @trommler's experience mentioned in the last line of his reply: I also encountered this massive blow-up on my Mac. Prior to Peter's fix, hs-to-coq consumed close to 20 GB of resources on my machine at some point and I was about to give up on it.

nomeata commented 4 years ago

I wasn’t doubting the “optimization” part – and given that evidence, I take the “premature” back :-)

(The coverage checker is rather naive, so one can probably replace it with a proper algorithm to avoid that blow up. But as a quick fix for a common case it makes sense.)

lastland commented 4 years ago

OK. I have merged this PR and I will try to explain what the optimization does in a comment. Thanks @trommler! And thanks @tadeuszlitak for the comment.