Open celinval opened 1 year ago
It looks like CBMC identifies the edge bb1 -> exit
as a back edge because their location in the code, when that really shouldn't matter.
It looks like CBMC identifies the edge
bb1 -> exit
as a back edge because their location in the code, when that really shouldn't matter.
It is true that this doesn't match Muchnick's definition of a back edge, because we are going "back" (in location order) to a node we haven't even seen before. Fixing symbolic execution alone would be fairly easy (we would just need to keep track of which nodes we have seen already, although there is memory cost to this), but then it would be much harder to label loops.
But ... what is the real problem here? The example works just fine when not specifying an unwinding bound, or using any bound greater or equal to 2.
@celinval I have my doubts about how robust our loop detection and handling is against unusually structured and adversarial code. I think symex handles it reasonably but other parts of the system are more fragile. If it's something you want to pursue I have a collection of horrible examples that I can share! I have hopes that we might have someone working on this soon.
But ... what is the real problem here? The example works just fine when not specifying an unwinding bound, or using any bound greater or equal to 2.
2
to the number they think should be used just in case. I wonder if some cases are related to how CBMC is identifying loops in the first place.@celinval makes a good point, it would be good if we could do better loop detection. Maybe even have a loop canonicalisation pass? I guess the question is who implements it.
@celinval makes a good point, it would be good if we could do better loop detection. Maybe even have a loop canonicalisation pass? I guess the question is who implements it.
Yes. I think having a pass to handle loops is a great idea. It would also likely mitigate @tautschnig concern of memory increase, since it should be a simple DFS, and we can just cache the final result.
In this case, can we label all cycle / loop entries?
But ... what is the real problem here?
goto-instrument --show-natural-loops
shows no loops on this example, and goto-instrument --show-loops
shows a loop (jump from bb1 to exit), and it seems like most users assimilate loops with natural loops, making it hard to know what to do in that situation.
@martin-cs there is a similar discussion going on about loop identification and canonicalisation from the perspective of loop contract instrumentation, with the additional needs that goto-instructions forming natural loops are densely packed in the instruction sequence without intermediate non-loop instructions, so that the loop instructions sequence forming the loop can easily be replaced by its base ; havoc ; step
model for loop contract verification.
Could we take inspiration from LLVM and at try to rewrite the goto-instruction sequence so that
It sounds good to me. As long as the loop identification is done based on the CFG, and we don't end up rewriting cases like this where there is no loop.
I do have a question though, is there an easy fix to solve the issue at hand while we wait for the more robust solution?
@martin-cshttps://github.com/martin-cs there is a similar discussion going on about loop identification and canonicalisation from the perspective of loop contract instrumentation, with the additional needs that goto-instructions forming natural loops are densely packed in the instruction sequence without intermediate non- loop instructions.
Yep. We ran into exactly the same problem in ASVAT (see the part paper
on "Things I Got Wrong About Loops", that was recording the
experience). goto-analyzer
hasn't got there yet because at the
moment the widening support is not good enough that these issues show
up. It will happen in the near future.
We all need:
Could we take inspiration from LLVM and at try to rewrite the goto- instrument sequence puts natural loops in LoopSimplifyForm< https://llvm.org/docs/LoopTerminology.html#id10>, and re-linearise the goto instruction sequence so that the only backedges from the goto instruction sequence perspective are actual backedges of normalised natural loops ?
Yes. ( This also answers Celina's question ). If you did a BFS and re-linearised so that topological back-edges and literal back-edges match up then it would solve this problem and be a general band-aid for this kind of issue.
Oh, forgot to add, other tools that need this / are probably broken by some of the horrible loop test cases : loop acceleration, cprover
conversion to Horn Clauses, 2ls
's abstraction (I know @peterschrammel had a lot of "fun" fixing some of these in that, especially break
s IIRC), ummm... probably some others too.
My understanding is that a loop canonicalization pass may be able to resolve this issue. I am commenting here in order to link the canonicalization ticket - https://github.com/diffblue/cbmc/issues/7211
CBMC version: 5.75.0 Operating system: Ubuntu 22.04 Exact command line resulting in the issue:
cbmc no_loops.c --unwind 1 --unwinding-assertions
What behaviour did you expect: The assertionmain.assertion.1
in the code should fail. What happened instead: The assertionmain.assertion.1
succeeds, while there's a unexpected unwinding assertion failure.Source code:
Output: