alt-romes / hegg

Fast equality saturation in Haskell
https://hackage.haskell.org/package/hegg
BSD 3-Clause "New" or "Revised" License
75 stars 8 forks source link

Possible Id collision in compileToQuery #20

Open Tarmean opened 1 year ago

Tarmean commented 1 year ago

compileToQuery generates new match ids starting from 0. But when users specify a VariablePattern n, n might collide with these generated IDs. Note that collisions are very unlikely as long as users specify patterns with the IsString instance which uses fromString = VariablePattern . hashString.

The simplest fix would be something like let root :~ atoms = evalState (aux pa) (maximum (0:vars p)+1).

Unfortunately a fix requires some design decisions. There are two 'obvious' fixes: Compact user-given variables, or start name-generating form the largest user-given variable+1. The VariablePattern names are later used to retrieve results, so if the user-given names were compacted the Query would have to carry a Map UserVar InternalVar. And the hashString variables are evenly distributed, so starting at max+1 might be very large (but probably not so large that it would lead to an overflow).

Reproduction:

somePattern :: Pattern []
somePattern = NonVariablePattern [VariablePattern 0,VariablePattern 1] 

>compileToQuery  somePattern
(Query [0,1] [Atom (CVar 0) [CVar 0,CVar 1]], 0)

Note that the compiled pattern matches ?A = (?A,?B), i.e. a cyclic graph.

alt-romes commented 1 year ago

Thanks for spotting this!

Perhaps we could avoid the problem altogether by not exporting NonVariablePattern (though that might make the interface/documentation of patterns a bit weird), and by having the string instance be modulo some number (say 2000), and then generated IDs start from that number (2000).

In any case, if you'd like to attempt that, or one of the fixes you mentioned (choose according to your best judgment), I'd be happy to review+merge an MR.