goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
172 stars 72 forks source link

Unsoundnesses in SV-COMP 2022 #382

Closed sim642 closed 2 years ago

sim642 commented 2 years ago

SV-COMP 2022 prerun results for the Goblint archive from SV-COMP 2021, but with updated sv-benchmarks, reveals unsoundness in three benchmarks:

These tasks were not used for ReachSafety last year because they were accidentally missing from a .set file: https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks/-/merge_requests/1306. This is why the unsoundness didn't show in SV-COMP 2021 final results.

I just checked that the unsoundness is still present on https://github.com/goblint/analyzer/commit/d4d9ace1fafbf01beac29e909460ef9ecad04115.


Additionally, the following issues were uncovered with the old set of benchmarks, and the updated Goblint:

With a longer timeout, we also get (after ~4 min):


From no-overflow property unsoundness from SV-COMP prerun (https://github.com/goblint/analyzer/commit/cbd79873):


Unsoundness at commit b87225c8:

michael-schwarz commented 2 years ago

Overall, there are 40 unsoundnesses on last year's benchmarks (even with a 1 minute timeout). I think it makes sense to also track those here.

master-e168a4e2884c9c-svcomp-json-default.zip

michael-schwarz commented 2 years ago

I added three more instances that were uncovered with a 15min timeout.

michael-schwarz commented 2 years ago

I reduced the issue with loops/s3.i to the following nice program:

void reach_error() {  }

int main(void)
{ 
  while (1) {
    if (1) {
      goto switch_1_12292;
    } else {
        goto switch_1_default;
        if (0) {
          switch_1_12292:
          reach_error();

          goto switch_1_break;

          switch_1_default:

          goto end;
        } 
          switch_1_break: ;      
    }

  }
  while_0_break: ;

  end:
}

My guess would be something going wrong with #325 maybe, @sim642?

If I uncomment the following lines, it works again: https://github.com/goblint/analyzer/blob/4999bbe266fbef3e6a19ff780d56819fc904d513/src/framework/constraints.ml#L716-L718

sim642 commented 2 years ago

My guess would be something going wrong with #325 maybe, @sim642?

Interesting, I'll have a look tomorrow. I suppose your run already included https://github.com/goblint/analyzer/commit/e00a9949a7c1fab8de750c4a6d0eb6d317cef3c0?

michael-schwarz commented 2 years ago

I suppose your run already included e00a994?

Yup, I ran on 497d7b32f.

sim642 commented 2 years ago

Running with --enable dbg.cfg.loop-clusters shows it to have the following stupid CFG (SCC in the rectangle): image

The strict loop entry logic forces node 9 to be bottom because it's considered a loop entry (incoming edge from another SCC from node 7) and that node is bottom.

Not yet sure if this bug only occurs due to the stupid CFG or there is any proper cases where the strict loop mechanism also is wrong.

sim642 commented 2 years ago

Not yet sure if this bug only occurs due to the stupid CFG or there is any proper cases where the strict loop mechanism also is wrong.

Ok, I constructed cases where not so stupid CFG also results in it being wrong. Loops with multiple entry points (using goto) can also cause this. I have a potential fix, so I'll open a PR soon, but I'm not sure if it's sufficient for correctness.

michael-schwarz commented 2 years ago

I have a potential fix, so I'll open a PR soon, but I'm not sure if it's sufficient for correctness.

I started wondering at some point, whether SCCs are actually a useful notion in this context, and if the more natural way would not be to express this in terms of pre-dominators or something along those lines. But did not really come to any conclusions there.

michael-schwarz commented 2 years ago

Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?

sim642 commented 2 years ago

The first busybox ones were already present using the last year's archive (with this year's sv-benchmarks), so they weren't regressions from #325 anyway.

ivanazuzic commented 2 years ago

Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?

Yes, I'll look.

ivanazuzic commented 2 years ago

Fixes most of them! The busybox ones and the no-overflow ones were not included in that run, can you maybe verify manually for those @ivanazuzic ?

Yes, I'll look.

No-overflow should be true, but we find an overflow:

No-overflow should be false, but we never find an overflow:

michael-schwarz commented 2 years ago

No-overflow should be true, but we find an overflow:

This is to be expected. Was it actually no-overflow where we were unsound here, @sim642? Or reach safety?

No-overflow should be false, but we never find an overflow:

These are worrying.

sim642 commented 2 years ago

This is to be expected. Was it actually no-overflow where we were unsound here, @sim642? Or reach safety?

busybox ones are unreach-call still.

michael-schwarz commented 2 years ago

For busybox-1.22.0/cut-3 Ivana identified the issue that we do not consider the call to qsort https://github.com/sosy-lab/sv-benchmarks/blob/2e1723fde6aa65a250dcb677efa45edaa4b6b631/c/busybox-1.22.0/cut-3.i#L2654

to also cause a call to cmpfunc which it would need to be according to the C standard.

Apart from that which should clearly be fixed, I am a bit skeptical about the expected verdict. I don't understand under which circumstances this library function qsort would call cmpfunc with null pointer arguments.

sim642 commented 2 years ago

For busybox-1.22.0/cut-3 Ivana identified the issue that we do not consider the call to qsort https://github.com/sosy-lab/sv-benchmarks/blob/2e1723fde6aa65a250dcb677efa45edaa4b6b631/c/busybox-1.22.0/cut-3.i#L2654

to also cause a call to cmpfunc which it would need to be according to the C standard.

This is interesting because we don't have any special LibraryFunctions behavior defined for qsort that would do unsound things. So we should be invalidating the function and spawning it. We have no notion of just calling an unknown function during invalidation

I think I know what the problem is though: in the svcomp configs we disable unknown function spawning for some reason. I guess we might still want that, but then we need a separate notion of unknown function calling (and ctx.call I suppose). The difference being that we don't go to multithreaded mode unnecessarily.

EDIT: Or maybe it's just sufficient to here (in the case of unknown functions) also do one_function all the arguments? https://github.com/goblint/analyzer/blob/24d24f9458dd86b9c4e9cc1f42f23ed783ba64be/src/framework/constraints.ml#L642-L650

Apart from that which should clearly be fixed, I am a bit skeptical about the expected verdict. I don't understand under which circumstances this library function qsort would call cmpfunc with null pointer arguments.

But the standard also doesn't guarantee that it will never be called with NULLs, right?

EDIT: qsort at all might be questionable, depending on how you interpret the rules: https://github.com/sosy-lab/sv-benchmarks/issues/1225#issuecomment-728798589.

sim642 commented 2 years ago

I just got a new prerun after adding Goblint to Overall, which otherwise changes nothing except now we're unsound on four of our own submitted benchmarks (list in issue description)! This is embarassing...

If I remember correctly previously our new submitted benchmarks weren't merged yet though. At one point I ran Goblint on them and then it was fine, not sure how we managed to break something now. There's also the possibility that those benchmarks are somehow wrong and we should retract them.

sim642 commented 2 years ago

We looked at them with Vesal and found multiple problems, so I've updated them in our regression suite and opened a MR to get them removed from sv-benchmarks.

sim642 commented 2 years ago

Unsoundness is no more! https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks/-/merge_requests/1332

426 hasn't been fixed, but that doesn't show on sv-benchmarks, so we're good enough for now. If something pops up, this issue can be reopened.

sim642 commented 2 years ago

For completeness I'll mention it here: #438 introduced some unsoundness temporarily, but that was fixed by #446. SV-COMP prerun confirms that there are no unsound cases any more again.