creusot-rs / creusot

Creusot helps you prove your code is correct in an automated fashion.
GNU Lesser General Public License v2.1
1.15k stars 50 forks source link

`should_fail` tests are not checked for failing #1090

Closed jhjourdan closed 2 months ago

jhjourdan commented 2 months ago

For example, replaying 01_resolve_unsoundness returns:

[Warning] session is obsolete
[Warning] found detached goals or theories or transformations
 0/10 (replay failed, with some merge miss)
   goal 'vc_make_vec_of_size.2', prover 'Z3 4.12.1': not installed
   goal 'vc_make_vec_of_size.2', prover 'CVC5 1.0.4': not installed
   goal 'vc_make_vec_of_size.2', prover 'Alt-Ergo 2.4.2': not installed
   goal 'vc_make_vec_of_size.3', prover 'Z3 4.12.1': not installed
   goal 'vc_make_vec_of_size.3', prover 'CVC5 1.0.4': not installed
   goal 'vc_make_vec_of_size.3', prover 'Alt-Ergo 2.4.2': not installed
   goal 'make_vec_of_size'vc.3.1', prover 'Z3 4.12.1': not installed
   goal 'make_vec_of_size'vc.3.1', prover 'CVC5 1.0.4': not installed
   goal 'make_vec_of_size'vc.3.1', prover 'Alt-Ergo 2.4.2': not installed
   goal 'vc_make_vec_of_size.3', prover 'CVC4 1.8': Valid (0.06s, 13169 steps) instead of Timeout (1.00s, 259158 steps) (timelimit=1.00, memlimit=1000, steplimit=0)
   goal 'vc_make_vec_of_size.2', prover 'CVC4 1.8': Valid (0.06s, 12599 steps) instead of Timeout (5.00s, 719150 steps) (timelimit=5.00, memlimit=1000, steplimit=0)
Replay failed.

So the test suite should not pass, but it does in CI.

xldenis commented 2 months ago

That's not great... I think our current why3tests checks for Replay failed. but we need a more precise criterion.

jhjourdan commented 2 months ago

Cannot we use the return code of why3 replay?

xldenis commented 2 months ago

I'm not sure, I think it returns 0 even on failure? I remember an issue in why3 about this a long time ago.