abenkhadra / gosat

SMT solver for the theory of floating-point arithmetic
MIT License
25 stars 8 forks source link

potentially unsound answers #8

Open florianschanda opened 6 years ago

florianschanda commented 6 years ago

Hi,

We've found the following answers of gosat to be probably incorrect (sat that should be unsat and vice versa).

The filenames below come from my bag of benchmarks. https://github.com/florianschanda/smtlib_schanda I've tried my best to make sure the various status are correct by mistakes are of course possible.

nyxbrain/crafted-edge-cases/RNE:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RNE:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:the_answer_is_NaN_5_unsat.smt2
random/fp.cast/cast_rna_20033.smt2
random/fp.cast/cast_rna_20193.smt2
random/fp.cast/cast_rna_20201.smt2
random/fp.cast/cast_rna_20202.smt2
random/fp.cast/cast_rna_20204.smt2
random/fp.cast/cast_rna_20208.smt2
random/fp.cast/cast_rne_19904.smt2
random/fp.cast/cast_rne_19908.smt2
random/fp.cast/cast_rne_19916.smt2
random/fp.cast/cast_rne_19929.smt2
random/fp.cast/cast_rne_19930.smt2
random/fp.cast/cast_rtn_20753.smt2
random/fp.cast/cast_rtn_20760.smt2
random/fp.cast/cast_rtn_20761.smt2
random/fp.cast/cast_rtn_20763.smt2
random/fp.cast/cast_rtn_20775.smt2
random/fp.cast/cast_rtz_20828.smt2
random/fp.cast/cast_rtz_20879.smt2
random/fp.cast/cast_rtz_20966.smt2
random/fp.cast/cast_rtz_21033.smt2
random/fp.cast/cast_rtz_21034.smt2
random/fp.cast/cast_rtz_21036.smt2
random/fp.cast/cast_rtz_21037.smt2
random/fp.cast/cast_rtz_21040.smt2
random/fp.cast/cast_rtz_21054.smt2
wintersteiger/mul/mul-has-no-other-solution-10044.smt2
wintersteiger/mul/mul-has-no-other-solution-13484.smt2
wintersteiger/mul/mul-has-no-other-solution-15092.smt2
wintersteiger/mul/mul-has-no-other-solution-15226.smt2
wintersteiger/mul/mul-has-no-other-solution-15435.smt2
wintersteiger/mul/mul-has-no-other-solution-15552.smt2
wintersteiger/mul/mul-has-no-other-solution-19889.smt2
wintersteiger/mul/mul-has-no-other-solution-3202.smt2
wintersteiger/mul/mul-has-no-other-solution-3804.smt2
wintersteiger/mul/mul-has-no-other-solution-7814.smt2
wintersteiger/mul/mul-has-no-other-solution-8691.smt2
florianschanda commented 6 years ago

Also, for some of these the platform of the computer I am on seems to matter.

wintersteiger/mul/mul-has-no-other-solution-15884.smt2 for example returns SAT on my AMD computer but UNKNOWN on my Intel computer.

I am using the same binary (which I compiled on my intel). Both computers are running the same Debian. Below the relevant extracts from /proc/cpuinfo model name : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz model name : AMD Opteron(tm) Processor 6328

florianschanda commented 6 years ago

Some of the issues could be resolved by not using -march=native that the CMakeLists hard-codes. However the bulk of the issued described above remain.

abenkhadra commented 6 years ago

I can confirm that all smt examples you provided can lead to unsound results. The following bug fixes have been applied:

Also, the examples exposed two current limitations which might be addressed in the future:

Additionally, the changes have passed the regression tests on the griggio benchmarks. Hence, the results of our paper are still reproducible. However, the changes are significant and may cause bugs on other benchmarks :)

Finally, I do not think the -march=native plays a role here. The answers for a few benchmarks were unstable, with repeated runs, since they are very close to zero. Something in the order of 10^-17. Playing with the XTOL parameter might help. However, I have not looked much into tuning nlopt parameters.

Thanks for experimenting with goSAT!

florianschanda commented 6 years ago

Thank you for the fast fixes :) I am currently re-running our benchmark suite and will let you know if anything new pops up.

I think it would be good that for anything you see that you don't support (like different rounding modes in the same expression) to instantly emit "unsupported" (see page 47 of the smtlib standard) or give the "unknown" response to a check-sat query.

Treating Float16 or Float128 as Float64 will not be sound, I would again encourage you to emit "unknown" at that point.

I've also now compiled without -march and with -march=opteron-sse3 and both work. -march=native really does not if you build on a recent Intel box and then use the binary on an AMD machine. Many executions terminate with "illegal instruction" and others just give incorrect results.

florianschanda commented 6 years ago

Results are different now, we've found 4970 unsound answers, but they are not the same ones as before. Below a reduced list, but it should cover everything.

crafted/QF_FP/why3_review_01.smt2
nyxbrain/crafted-edge-cases/RNE:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:gt_to_lt_unsat.smt2
random/fp.cast/cast_rne_19862.smt2
random/fp.div/div_rne_00703.smt2
random/fp.div/div_rne_00758.smt2
random/fp.div/div_rne_00852.smt2
random/fp.div/div_rne_00920.smt2
random/fp.div/div_rne_00925.smt2
random/fp.div/div_rne_00939.smt2
random/fp.div/div_rne_01008.smt2
random/fp.div/div_rne_01093.smt2
random/fp.div/div_rne_01107.smt2
random/fp.div/div_rne_01125.smt2
random/fp.div/div_rne_01156.smt2
random/fp.div/div_rne_01181.smt2
random/fp.div/div_rne_01209.smt2
random/fp.mul/mul_rne_08016.smt2
random/fp.mul/mul_rne_08042.smt2
random/fp.mul/mul_rne_08061.smt2
random/fp.mul/mul_rne_08081.smt2
random/fp.mul/mul_rne_08114.smt2
random/fp.mul/mul_rne_08137.smt2
random/fp.mul/mul_rne_08292.smt2
random/fp.mul/mul_rne_08324.smt2
random/fp.mul/mul_rne_08326.smt2
random/fp.mul/mul_rne_08341.smt2
random/fp.mul/mul_rne_08345.smt2
random/fp.mul/mul_rne_08395.smt2
random/fp.sub/sub_rne_11182.smt2
random/fp.sub/sub_rne_11531.smt2
wintersteiger/add/add-has-no-other-solution-10020.smt2
wintersteiger/add/add-has-no-other-solution-1003.smt2
wintersteiger/div/div-has-no-other-solution-10317.smt2
wintersteiger/div/div-has-no-other-solution-11528.smt2
wintersteiger/eq/eq-has-no-other-solution-10005.smt2
wintersteiger/eq/eq-has-no-other-solution-10015.smt2
wintersteiger/gt/gt-has-no-other-solution-10.smt2
wintersteiger/gt/gt-has-no-other-solution-10013.smt2
wintersteiger/lt/lt-has-no-other-solution-10001.smt2
wintersteiger/lt/lt-has-no-other-solution-10003.smt2
wintersteiger/mul/mul-has-no-other-solution-10112.smt2
wintersteiger/mul/mul-has-no-other-solution-10506.smt2
wintersteiger/sub/sub-has-no-other-solution-1.smt2
wintersteiger/sub/sub-has-no-other-solution-10068.smt2
abenkhadra commented 6 years ago

I added a few usability features which includes reporting "unsupported" upon detecting FP16 or FP128 variables. More importantly, it turns out that the parser of z3 provides Z3_OP_EQ (logical equality) in places where we expect Z3_OP_FPA_EQ (FP equality). This causes a bug since the treatment of the two equality types was split, for good, in the recent updates.

For example, consider the benchmark wintersteiger/eq/eq-has-no-other-solution-10015.smt2 which is partially reproduced below:

(set-logic QF_FP)
(set-info :status unsat)
(define-sort FPN () (_ FloatingPoint 11 53))
(declare-fun x () FPN)
(declare-fun y () FPN)
(assert (= x (fp #b0 #b01000101001 #b0000011101011111100011111011000000011101101101000010)))
(assert (= y (fp #b0 #b11000011110 #b0110111100000001000011010011110101010001011010111100)))
(assert  (not (= (fp.eq x y) false)))
(check-sat)

the parser gives Z3_OP_FPA_EQ for the comparison between x and y which is expected. However, it gives Z3_OP_EQ for the comparison between x and its corresponding constant despite the fact that the comparison is between FP (not booleans).

I've implemented a workaround where for operation Z3_OP_EQ, we look ahead into the type of the arguments. Based on that, we decide what code should be generated.

florianschanda commented 6 years ago

The Z3 parser is correct, the comparison is SMTLIB =, not FP EQ.

I believe your workaround is only going to add unsoundness when benchmarks depend on subtle difference between the two. For these specific constants they are equivalent. But they are not equivalent if either of the constant is 0 or NaN.

In C terms you might want to translate (fp.eq x y) as x == y, and (= x y) as *(int*)(&x) == *(int*)(&y).

florianschanda commented 6 years ago

I should further note that the SMTLIB floating point standard makes this distinction, and several benchmarks depend on this.

abenkhadra commented 6 years ago

Sorry for the misunderstanding. I believe that z3 is doing the correct thing, however, as far as the code generator is concerned, it was unexpected.

I'm interested in supporting a reasonable subset of QF_FP in a sound or, for practical reasons, may be "soundy" way. Along the way, we can learn about the limits of the stochastic search approach. So far, goSAT passes the soundness test on the griggio benchmarks in addition to the examples provided.