Witness invariants for unrolled loops are incorrect

sim642 commented 12 months ago

Our CIL AST based loop unrolling duplicates nodes for the same program point (in the literal sense). Thus we end up generating witness invariants for each node but the same location, e.g. i == 0 and i == 15, which are contradictory:

- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: dcd2d1a7-ae43-46a4-9ac9-528fc3df8507
    creation_time: 2023-10-30T08:34:16Z
    producer:
      name: Goblint
      version: heads/pldi-bench-0-gec49852db
      command_line: '''./goblint'' ''--conf'' ''conf/svcomp.json'' ''--enable'' ''witness.yaml.enabled'' ''--sets'' ''ana.specification'' ''/mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp'' ''--sets'' ''exp.architecture'' ''32bit'' ''/mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i'''
    task:
      input_files:
        - /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i
      input_file_hashes:
        /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i: 9c5d8dd6c87f471ee77fd3b765c8ecabfaf01dd976e127275ea7c589f724f472
      data_model: ILP32
      language: C
      specification: CHECK( init(main()), LTL(G ! call(reach_error())) )
  location:
    file_name: /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i
    file_hash: 9c5d8dd6c87f471ee77fd3b765c8ecabfaf01dd976e127275ea7c589f724f472
    line: 29
    column: 8
    function: main
  location_invariant:
    string: i == 0
    type: assertion
    format: C
- entry_type: location_invariant
  metadata:
    format_version: "0.1"
    uuid: 84342cda-192f-4411-a241-5436848150c9
    creation_time: 2023-10-30T08:34:16Z
    producer:
      name: Goblint
      version: heads/pldi-bench-0-gec49852db
      command_line: '''./goblint'' ''--conf'' ''conf/svcomp.json'' ''--enable'' ''witness.yaml.enabled'' ''--sets'' ''ana.specification'' ''/mnt/goblint-svcomp/benchexec/sv-benchmarks/c/properties/unreach-call.prp'' ''--sets'' ''exp.architecture'' ''32bit'' ''/mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i'''
    task:
      input_files:
        - /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i
      input_file_hashes:
        /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i: 9c5d8dd6c87f471ee77fd3b765c8ecabfaf01dd976e127275ea7c589f724f472
      data_model: ILP32
      language: C
      specification: CHECK( init(main()), LTL(G ! call(reach_error())) )
  location:
    file_name: /mnt/goblint-svcomp/benchexec/sv-benchmarks/c/loop-acceleration/array_3-1.i
    file_hash: 9c5d8dd6c87f471ee77fd3b765c8ecabfaf01dd976e127275ea7c589f724f472
    line: 29
    column: 8
    function: main
  location_invariant:
    string: i == 15
    type: assertion
    format: C

Again, path-sensitivity–based unrolling would automatically avoid this issue because witness invariants are disjunctions over all paths at a node.

michael-schwarz commented 12 months ago

I think this less of an issue with the unrolling, but with the witness generation and it's mapping back to program points.

sim642 commented 12 months ago

In a way, it's a matter of what "program point" means for us. There are

physical program points, represented by Cil.location;
logical program points, represented by Node.t.

Actually, it's not just witness generation that is broken by unrolling, but every "join everything per node" process in Goblint and we have many of those, including:

dead branch detection,
everything consuming ResultQuery (e.g. server mode),
transformations.

Having to duplicate some fix for each one is far from elegant and a maintenance nightmare.

"Join everything per Cil.location" is very unreliable because multiple physical program points correspond to one logical program point. That's what WitnessUtil.Locator already tries to capture (although not entirely correctly due to } placement). Allowing multiplicity the other way makes things a lot more complex (the relation is some bipartite graph).

The AST-based unrolling actually introduces some quadratic unrolling when combined with unique mallocs/thread-creates:

One thread-create node is duplicated up to unroll factor.
Per each node, thread creation also produces distinct thread IDs up to unroll factor.

Initially, we didn't have the latter, so node duplication was a way to achieve the same effect "for free". When the latter was added (to have unrolled unique thread IDs without unrolling the entire loop itself), this quadratic behavior arose. I don't think this was intentional, or?

So the domain-based unrolling already exists but only in very specific places. It could just work at a higher level and "join over all paths" would simply take care of it.

sim642 commented 11 months ago

As expected, the location ambiguity causes problems. For example, consider this program:

int main() {
  int i;
  for (i = 0; i < 10; i++);
  return 0;
}

When generating a loop invariant at the beginning of line 3, we get a top invariant (nothing known about i).

That is because the node before i = 0; has the exact same location as the loop head itself. So it's joined with top uninitialized value for i. For a location invariant, that would be the only correct thing. But loop invariants exist precisely to avoid such problems and speak only about the actual loop head.

When generating loop invariants, we cannot just ignore non-loop-head nodes at the same location because that would again be unsound thanks to the syntactic loop unrolling. The unrolled copies of the loop don't have a loop head in the CFG, only the final unrolled loop head is a loop head according to the CFG. Therefore, to account for all iterations, including the unrolled ones, invariant generation has to join everything at that CIL location.

In #1248 I have done so to fix the unsoundness, but this phenomenon is quite counterintuitive:

If no syntactic loop unrolling is done, we can output the invariant 0 <= i && i <= 10 for the loop head. This is what we've done so far.
If loop unrolling is enabled, then we output no invariant at all, because internally it is top || i == 0 || i == 1 || (2 <= i && i <= 10).

So syntactic loop unrolling makes our analysis more precise but our witnesses less precise.

michael-schwarz commented 11 months ago

Is there something to be done here, such as marking these nodes somehow during the unrolling?

goblint / analyzer

Witness invariants for unrolled loops are incorrect #1225