Closed sim642 closed 11 months ago
I think this less of an issue with the unrolling, but with the witness generation and it's mapping back to program points.
In a way, it's a matter of what "program point" means for us. There are
Cil.location
;Node.t
.Actually, it's not just witness generation that is broken by unrolling, but every "join everything per node" process in Goblint and we have many of those, including:
ResultQuery
(e.g. server mode),Having to duplicate some fix for each one is far from elegant and a maintenance nightmare.
"Join everything per Cil.location
" is very unreliable because multiple physical program points correspond to one logical program point. That's what WitnessUtil.Locator
already tries to capture (although not entirely correctly due to }
placement). Allowing multiplicity the other way makes things a lot more complex (the relation is some bipartite graph).
The AST-based unrolling actually introduces some quadratic unrolling when combined with unique mallocs/thread-creates:
Initially, we didn't have the latter, so node duplication was a way to achieve the same effect "for free". When the latter was added (to have unrolled unique thread IDs without unrolling the entire loop itself), this quadratic behavior arose. I don't think this was intentional, or?
So the domain-based unrolling already exists but only in very specific places. It could just work at a higher level and "join over all paths" would simply take care of it.
As expected, the location ambiguity causes problems. For example, consider this program:
int main() {
int i;
for (i = 0; i < 10; i++);
return 0;
}
When generating a loop invariant at the beginning of line 3, we get a top invariant (nothing known about i
).
That is because the node before i = 0;
has the exact same location as the loop head itself. So it's joined with top uninitialized value for i
. For a location invariant, that would be the only correct thing. But loop invariants exist precisely to avoid such problems and speak only about the actual loop head.
When generating loop invariants, we cannot just ignore non-loop-head nodes at the same location because that would again be unsound thanks to the syntactic loop unrolling. The unrolled copies of the loop don't have a loop head in the CFG, only the final unrolled loop head is a loop head according to the CFG. Therefore, to account for all iterations, including the unrolled ones, invariant generation has to join everything at that CIL location.
In #1248 I have done so to fix the unsoundness, but this phenomenon is quite counterintuitive:
0 <= i && i <= 10
for the loop head. This is what we've done so far.top || i == 0 || i == 1 || (2 <= i && i <= 10)
.So syntactic loop unrolling makes our analysis more precise but our witnesses less precise.
Is there something to be done here, such as marking these nodes somehow during the unrolling?
Our CIL AST based loop unrolling duplicates nodes for the same program point (in the literal sense). Thus we end up generating witness invariants for each node but the same location, e.g.
i == 0
andi == 15
, which are contradictory:Again, path-sensitivity–based unrolling would automatically avoid this issue because witness invariants are disjunctions over all paths at a node.