906 made the default behavior of indeterminate-load-behavior be stable-symbolic rather than unstable-symbolic. While this was largely an improvement on SV-COMP benchmark programs, there were a small number of regressions as well. To quote https://github.com/GaloisInc/crucible/pull/906#issuecomment-975636496:
The results from running against the unreach-call SV-COMP benchmark set are in. Here are what the results were before this PR:
One program changed from a correct verdict to simply erroring out before it ever reports a verdict:
[ ] loops/matrix-2.yml
The error in particular is fd:4: hFlush: resource vanished (Broken pipe). Bizarre.
There are now two programs that report incorrect true verdicts:
ldv-regression/test28-1.yml
list-ext3-properties/sll_of_sll_nondet_append-2.yml (EDIT: See #940.)
ldv-regression/test28-1.yml is a red herring, as that is the result of LLVM 12 more aggressively optimizing away a program that uses uninitialized memory. Moreover, that issue has been fixed upstream, so I don't think we'd see that on an actual SV-COMP competition run. I don't know what is causing the other program to report an incorrect true verdict, however.
This issue serves as a reminder to investigate these. In particular, list-ext3-properties/sll_of_sll_nondet_append-2.yml is troublesome.
906 made the default behavior of
indeterminate-load-behavior
bestable-symbolic
rather thanunstable-symbolic
. While this was largely an improvement on SV-COMP benchmark programs, there were a small number of regressions as well. To quote https://github.com/GaloisInc/crucible/pull/906#issuecomment-975636496:The results from running against the
unreach-call
SV-COMP benchmark set are in. Here are what the results were before this PR:And after this PR:
Our score improved overall, which is nice. There are some concerning things, however:
Six programs changed from a correct verdict to an
UNKNOWN
verdict:reducercommutativity/rangesum40.yml
loops/eureka_01-1.yml
nla-digbench-scaling/dijkstra-u_valuebound20.yml
nla-digbench-scaling/lcm1_valuebound10.yml
seq-mthreaded/pals_floodmax.4.1.ufo.UNBOUNDED.pals.yml
aws-c-common/aws_add_size_checked_harness.yml
Three programs changed from a correct verdict to an incorrect verdict:
ldv-linux-3.4-simple/32_1_cilled_ok_nondet_linux-3.4-32_1-drivers--platform--x86--intel_menlow.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--net--arcnet--arc-rimi.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--net--arcnet--com90io.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
One program changed from a correct verdict to simply erroring out before it ever reports a verdict:
loops/matrix-2.yml
The error in particular is
fd:4: hFlush: resource vanished (Broken pipe)
. Bizarre.There are now two programs that report incorrect true verdicts:
ldv-regression/test28-1.yml
list-ext3-properties/sll_of_sll_nondet_append-2.yml
(EDIT: See #940.)ldv-regression/test28-1.yml
is a red herring, as that is the result of LLVM 12 more aggressively optimizing away a program that uses uninitialized memory. Moreover, that issue has been fixed upstream, so I don't think we'd see that on an actual SV-COMP competition run. I don't know what is causing the other program to report an incorrect true verdict, however.This issue serves as a reminder to investigate these. In particular,
list-ext3-properties/sll_of_sll_nondet_append-2.yml
is troublesome.