906#issuecomment-975636496:

The results from running against the unreach-call SV-COMP benchmark set are in. Here are what the results were before this PR:

Statistics:             7753 Files                                                                                                                                                                                               
  correct:              2030                                                                                                                                                                                                     
    correct true:       933                                                                                                                                                                                                      
    correct false:      1097                                                                                                                                                                                                     
  incorrect:            1596                                                                                                                                                                                                     
    incorrect true:     0                                                                                                                                                                                                        
    incorrect false:    1596                                                                                                                                                                                                     
  unknown:              4127                                                                                                                                                                                                     
Score:                  -22573 (max: 13248)

And after this PR:

Statistics:           7753 Files
  correct:            2094
    correct true:      990
    correct false:    1104
  incorrect:          1554
    incorrect true:      2
    incorrect false:  1552
  unknown:            4105
  Score:            -21812 (max: 13248)

Our score improved overall, which is nice. There are some concerning things, however:

Six programs changed from a correct verdict to an UNKNOWN verdict:
- [ ] reducercommutativity/rangesum40.yml
- [ ] loops/eureka_01-1.yml
- [ ] nla-digbench-scaling/dijkstra-u_valuebound20.yml
- [ ] nla-digbench-scaling/lcm1_valuebound10.yml
- [ ] seq-mthreaded/pals_floodmax.4.1.ufo.UNBOUNDED.pals.yml
- [ ] aws-c-common/aws_add_size_checked_harness.yml
Three programs changed from a correct verdict to an incorrect verdict:
- [ ] ldv-linux-3.4-simple/32_1_cilled_ok_nondet_linux-3.4-32_1-drivers--platform--x86--intel_menlow.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
- [ ] ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--net--arcnet--arc-rimi.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
- [ ] ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--net--arcnet--com90io.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml
One program changed from a correct verdict to simply erroring out before it ever reports a verdict:
- [ ] loops/matrix-2.yml
The error in particular is fd:4: hFlush: resource vanished (Broken pipe). Bizarre.
There are now two programs that report incorrect true verdicts:
- ~~ldv-regression/test28-1.yml~~
- list-ext3-properties/sll_of_sll_nondet_append-2.yml (EDIT: See #940.)
ldv-regression/test28-1.yml is a red herring, as that is the result of LLVM 12 more aggressively optimizing away a program that uses uninitialized memory. Moreover, that issue has been fixed upstream, so I don't think we'd see that on an actual SV-COMP competition run. I don't know what is causing the other program to report an incorrect true verdict, however.

This issue serves as a reminder to investigate these. In particular, list-ext3-properties/sll_of_sll_nondet_append-2.yml is troublesome.

GaloisInc / crucible

SV-COMP: Investigate regressions with `indeterminate-load-behavior: stable-symbolic` #926