GaloisInc / crucible

Crucible is a library for symbolic simulation of imperative programs
634 stars 42 forks source link

SV-COMP: Investigate regressions with `indeterminate-load-behavior: stable-symbolic` #926

Open RyanGlScott opened 2 years ago

RyanGlScott commented 2 years ago

906 made the default behavior of indeterminate-load-behavior be stable-symbolic rather than unstable-symbolic. While this was largely an improvement on SV-COMP benchmark programs, there were a small number of regressions as well. To quote https://github.com/GaloisInc/crucible/pull/906#issuecomment-975636496:


The results from running against the unreach-call SV-COMP benchmark set are in. Here are what the results were before this PR:

Statistics:             7753 Files                                                                                                                                                                                               
  correct:              2030                                                                                                                                                                                                     
    correct true:       933                                                                                                                                                                                                      
    correct false:      1097                                                                                                                                                                                                     
  incorrect:            1596                                                                                                                                                                                                     
    incorrect true:     0                                                                                                                                                                                                        
    incorrect false:    1596                                                                                                                                                                                                     
  unknown:              4127                                                                                                                                                                                                     
Score:                  -22573 (max: 13248)

And after this PR:

Statistics:           7753 Files
  correct:            2094
    correct true:      990
    correct false:    1104
  incorrect:          1554
    incorrect true:      2
    incorrect false:  1552
  unknown:            4105
  Score:            -21812 (max: 13248)

Our score improved overall, which is nice. There are some concerning things, however:


This issue serves as a reminder to investigate these. In particular, list-ext3-properties/sll_of_sll_nondet_append-2.yml is troublesome.

RyanGlScott commented 2 years ago

I've opened a separate issue for list-ext3-properties/sll_of_sll_nondet_append-2.yml in #940.