Closed sim642 closed 2 years ago
Overall, there are 40 unsoundnesses on last year's benchmarks (even with a 1 minute timeout). I think it makes sense to also track those here.
I added three more instances that were uncovered with a 15min timeout.
I reduced the issue with loops/s3.i
to the following nice program:
void reach_error() { }
int main(void)
{
while (1) {
if (1) {
goto switch_1_12292;
} else {
goto switch_1_default;
if (0) {
switch_1_12292:
reach_error();
goto switch_1_break;
switch_1_default:
goto end;
}
switch_1_break: ;
}
}
while_0_break: ;
end:
}
My guess would be something going wrong with #325 maybe, @sim642?
If I uncomment the following lines, it works again: https://github.com/goblint/analyzer/blob/4999bbe266fbef3e6a19ff780d56819fc904d513/src/framework/constraints.ml#L716-L718
My guess would be something going wrong with #325 maybe, @sim642?
Interesting, I'll have a look tomorrow. I suppose your run already included https://github.com/goblint/analyzer/commit/e00a9949a7c1fab8de750c4a6d0eb6d317cef3c0?
I suppose your run already included
e00a994
?
Yup, I ran on 497d7b32f.
Running with --enable dbg.cfg.loop-clusters
shows it to have the following stupid CFG (SCC in the rectangle):
The strict loop entry logic forces node 9 to be bottom because it's considered a loop entry (incoming edge from another SCC from node 7) and that node is bottom.
Not yet sure if this bug only occurs due to the stupid CFG or there is any proper cases where the strict loop mechanism also is wrong.
Not yet sure if this bug only occurs due to the stupid CFG or there is any proper cases where the strict loop mechanism also is wrong.
Ok, I constructed cases where not so stupid CFG also results in it being wrong. Loops with multiple entry points (using goto) can also cause this. I have a potential fix, so I'll open a PR soon, but I'm not sure if it's sufficient for correctness.
I have a potential fix, so I'll open a PR soon, but I'm not sure if it's sufficient for correctness.
I started wondering at some point, whether SCCs are actually a useful notion in this context, and if the more natural way would not be to express this in terms of pre-dominators or something along those lines. But did not really come to any conclusions there.
Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?
The first busybox ones were already present using the last year's archive (with this year's sv-benchmarks), so they weren't regressions from #325 anyway.
Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?
Yes, I'll look.
Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?
Yes, I'll look.
No-overflow should be true, but we find an overflow:
No-overflow should be false, but we never find an overflow:
No-overflow should be true, but we find an overflow:
This is to be expected. Was it actually no-overflow where we were unsound here, @sim642? Or reach safety?
No-overflow should be false, but we never find an overflow:
These are worrying.
This is to be expected. Was it actually no-overflow where we were unsound here, @sim642? Or reach safety?
busybox ones are unreach-call still.
For busybox-1.22.0/cut-3
Ivana identified the issue that we do not consider the call to qsort
https://github.com/sosy-lab/sv-benchmarks/blob/2e1723fde6aa65a250dcb677efa45edaa4b6b631/c/busybox-1.22.0/cut-3.i#L2654
to also cause a call to cmpfunc
which it would need to be according to the C standard.
Apart from that which should clearly be fixed, I am a bit skeptical about the expected verdict. I don't understand under which circumstances this library function qsort
would call cmpfunc
with null pointer arguments.
For
busybox-1.22.0/cut-3
Ivana identified the issue that we do not consider the call to qsort https://github.com/sosy-lab/sv-benchmarks/blob/2e1723fde6aa65a250dcb677efa45edaa4b6b631/c/busybox-1.22.0/cut-3.i#L2654to also cause a call to
cmpfunc
which it would need to be according to the C standard.
This is interesting because we don't have any special LibraryFunctions
behavior defined for qsort
that would do unsound things. So we should be invalidating the function and spawning it. We have no notion of just calling an unknown function during invalidation
I think I know what the problem is though: in the svcomp configs we disable unknown function spawning for some reason. I guess we might still want that, but then we need a separate notion of unknown function calling (and ctx.call
I suppose). The difference being that we don't go to multithreaded mode unnecessarily.
EDIT: Or maybe it's just sufficient to here (in the case of unknown functions) also do one_function
all the arguments? https://github.com/goblint/analyzer/blob/24d24f9458dd86b9c4e9cc1f42f23ed783ba64be/src/framework/constraints.ml#L642-L650
Apart from that which should clearly be fixed, I am a bit skeptical about the expected verdict. I don't understand under which circumstances this library function
qsort
would callcmpfunc
with null pointer arguments.
But the standard also doesn't guarantee that it will never be called with NULL
s, right?
EDIT: qsort
at all might be questionable, depending on how you interpret the rules: https://github.com/sosy-lab/sv-benchmarks/issues/1225#issuecomment-728798589.
I just got a new prerun after adding Goblint to Overall, which otherwise changes nothing except now we're unsound on four of our own submitted benchmarks (list in issue description)! This is embarassing...
If I remember correctly previously our new submitted benchmarks weren't merged yet though. At one point I ran Goblint on them and then it was fine, not sure how we managed to break something now. There's also the possibility that those benchmarks are somehow wrong and we should retract them.
We looked at them with Vesal and found multiple problems, so I've updated them in our regression suite and opened a MR to get them removed from sv-benchmarks.
Unsoundness is no more! https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks/-/merge_requests/1332
For completeness I'll mention it here: #438 introduced some unsoundness temporarily, but that was fixed by #446. SV-COMP prerun confirms that there are no unsound cases any more again.
SV-COMP 2022 prerun results for the Goblint archive from SV-COMP 2021, but with updated sv-benchmarks, reveals unsoundness in three benchmarks:
These tasks were not used for ReachSafety last year because they were accidentally missing from a
.set
file: https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks/-/merge_requests/1306. This is why the unsoundness didn't show in SV-COMP 2021 final results.I just checked that the unsoundness is still present on https://github.com/goblint/analyzer/commit/d4d9ace1fafbf01beac29e909460ef9ecad04115.
Additionally, the following issues were uncovered with the old set of benchmarks, and the updated Goblint:
With a longer timeout, we also get (after ~4 min):
From no-overflow property unsoundness from SV-COMP prerun (https://github.com/goblint/analyzer/commit/cbd79873):
Unsoundness at commit b87225c8: