Closed shader closed 8 years ago
It would be nice to show us some minimal example producing this problem. Otherwise it's impossible to debug.
+1
I'm not sure I can produce a minimal example, but the issue randomly occurs while running the tests for https://github.com/shader/precedence
I realize it's a bit much to ask you to clone and test, but make parser
and tox
should be all that's required, if you have tox installed. Apparently just running make test
doesn't actually execute any tests, so I need to fix that. The test that best illustrates the issue is test_protocol_dead_end in bspl_test.py, but it only happens ~1/3 of the time or less.
I could paste in the formula that it's attempting to convert and some examples of the CNF it generates if you want, but it's ~4200 expressions.
Ok, so I moved the code over to boolexpr (pushed on the boolexpr
branch), and it doesn't seem to have improved anything.
The formula transformations seem to be slower than those for pyeda, though I didn't test all of the pyeda versions. simplify() takes around ~0.1 seconds, tseytin() takes ~0.5 seconds. tseytin does properly handle the cases that were simplified to constants. to_cnf() takes ~10 seconds, which is much slower than pyeda was.
I don't really know how to compare the size of the generated output, since len(xs) and size() work differently, but it seems much larger at first glance. size() returns 24974 before simplification, 24856 after simplify, 119840 after tseytin, 119144 after simplify+tseytin, ~16947 after to_cnf, and ~17187 after simplify+to_cnf. to_cnf still has some variability in the size of its output, +/- 100 or so.
Though the transformations themselves are slower, it seems that boolexpr's sat() is actually much faster. The total time of generating the formula and running sat() was about the same with or without tseytin(), making the transformation seem unnecessary, at least in this case.
Sadly, it seems that the instability is actually in the SAT solving process itself, because I saw the same issue as before even without any simplification or cnf conversion. Most of the time the formula is unsatisfiable, but sometimes it finds one that it shouldn't. I just noticed sooner because I was able to run a lot more tests at the higher speed. satisfy_one() on pyeda was taking ~30s I think, so I didn't run very many tests, and just assumed it was to_cnf's fault because of the nondeterministic conversion.
I also saw an interesting exception go by once:
Exception ignored in: <bound method Context.__del__ of <boolexpr.wrap.Context object at 0x7ffff2dc1400>>
Traceback (most recent call last):
File "/usr/lib64/python3.4/site-packages/boolexpr/wrap.py", line 295, in __del__
AttributeError: 'NoneType' object has no attribute 'boolexpr_Context_del'
Exception ignored in: Exception ignored in: Exception ignored in: Exception ignored in: <bound method Context.__del__ of <boolexpr.wrap.Context object at 0x7ffff1a68d30>>
Traceback (most recent call last):
File "/usr/lib64/python3.4/site-packages/boolexpr/wrap.py", line 295, in __del__
AttributeError: 'NoneType' object has no attribute 'boolexpr_Context_del'
So what are my options for next steps? If it's the SAT solver, I'm not really sure what can be done to fix it.
Okay, after having a few problems with my unicode settings, I am able to reproduce errors you are seeing with precedence
. Looking into this.
Changed consistent
function to this:
def consistent(*statements):
events = extract_events(*statements)
formula = and_(timeline(*events),
occurrence(*events),
transitivity(*events),
*statements)
#print(formula.size())
return formula
and using this test case:
from precedence.bspl import *
from precedence import precedence
with open('precedence/test/samples/bspl/auction') as file:
x = load(file.read())[0]
f = consistent(x.dead_end)
sat, soln = f.sat()
print(sat)
Sometimes the answer comes up "True", and sometimes "False".
If you change the last part to this:
for i in range(100):
sat, soln = f.sat()
print(sat)
within the same process the answer is always the same. Ie, it will print "False False False ...", or "True True True ...". This possibly points to behavior of hash functions.
Is this test supposed to be SAT or UNSAT? I'm guessing UNSAT from the source code. Interestingly, when you get a SAT result, and do f.restrict(soln)
, it simplifies to 1
. The reason that's interesting is that it verifies the SAT result. The SAT engine doesn't share any code with restrict
.
Well, I think I found the source of the problem. Turns out I wasn't always sorting the events alphabetically as intended when generating the simultaneous pairs. extract_events
returned an unsorted set that was directly iterated over by occurrence
to generate simultaneous pairs. Anyway, I'm sure you aren't that interested in my code.
Point is, the problem was not actually in pyeda or boolexpr after all. Thanks for your help.
I strongly suspect that the consistent
function is not always returning equivalent expressions.
I ran test_protocol_dead_end
several times to get both SAT and UNSAT results from e = consistent(Auction.dead_end)
. I wrote them out to disk using the to_ast
method, then in a script reconstructed both with the from_ast
method. Unsurprisingly, the two functions were not equivalent. sat, soln = xor(f, g).sat()
returns True
(if XOR is SAT, then the functions are not equal). When restricting the two functions with soln
point, one returns 0
, the other returns 1
.
The expressions are exactly the same size, and definitely look alike, but they're too big to understand by manual inspection.
My experiment files are a bit hacky right now. I can try to put something together later than automates the proof process.
Hah, you wrote your response while I was typing my last comment :).
Regardless of this outcome, I need to thank you for identifying the other errors. I have seen the "interesting exception" you described, and I haven't started debugging it yet. It's definitely an interesting problem.
My current program is generating unreliable CNF with the to_cnf() method.
It seems to generate a random number of expressions between 4185 and 4192 for the same input formula. Normally I wouldn't mind the variability, but the largest formulas are also not equivalent. Specifically, the formulas with <4192 expressions are all correctly unsatisfiable, but the largest is satisfiable in a way that should be inconsistent with the expressions provided.
What can I do to improve the reliability of to_cnf()?