boogie-org / boogie

Boogie
http://research.microsoft.com/en-us/projects/boogie/
MIT License
504 stars 110 forks source link

Prune is nondeterministic with -vcsCores:2 #560

Closed RustanLeino closed 1 year ago

RustanLeino commented 2 years ago

Running Boogie repeatedly on the same input and with -vcsCores:2 -prune produces different output (for example, sometimes reporting 2 verification errors and other times reporting 4 verification errors). Furthermore, using -printPruned shows that there are differences even in the cases when the same number of verification errors is reported. For the test file I'm enclosing, it is also the case that running without pruning gives 0 verification errors (instead of 2 or 4).

Repro

I'm calling boogie with the SMT settings that Dafny uses, namely

/proverOpt:O:smt.arith.solver=2 /proverOpt:O:auto_config=false /proverOpt:O:type_check=true /proverOpt:O:smt.case_split=3 /proverOpt:O:smt.qi.eager_threshold=100 /proverOpt:O:smt.delay_units=true

and I'm using Z3 version 4.8.5 on MacOS. In the repros below, I'm also using -errorTrace:0 for cleaner output, but this flag makes no difference for the issue I'm reporting. The input file I used is d.bpl.

Pruning and using 2 cores

Runnng boogie d.bpl $(SMT_OPTIONS_ABOVE) -errorTrace:0 -vcsCores:2 -prune sometimes gives

d.bpl(5595,11): Error: This loop invariant might not be maintained by the loop.
d.bpl(5601,11): Error: This loop invariant might not be maintained by the loop.

Boogie program verifier finished with 12 verified, 2 errors

and sometimes gives

d.bpl(5595,11): Error: This loop invariant might not be maintained by the loop.
d.bpl(5601,11): Error: This loop invariant might not be maintained by the loop.
d.bpl(6715,1): Error: A postcondition might not hold on this return path.
d.bpl(6556,3): Related location: This is the postcondition that might not hold.
d.bpl(6715,1): Error: A postcondition might not hold on this return path.
d.bpl(6571,3): Related location: This is the postcondition that might not hold.

Boogie program verifier finished with 11 verified, 4 errors

If I additionally use -printPruned to print the pruned files, then I get differences even when the number of errors is generated to be the same. This suggests that the nondeterministic pruning happens more often than the verification output shows. For example, doing three consecutive runs of boogie with the flags above and -printPruned gives the following differences:

% diff tmp0.out tmp1.out
26282,26284d26281
<   var $_Frame#AT#0: <beta>[ref,Field beta]bool;
<   var $Heap#AT#0: Heap;
<   var k#0#AT#0: int;

% diff tmp1.out tmp2.out
84447a84448,84451
>   var $_Frame#AT#0: <beta>[ref,Field beta]bool;
>   var i#0#AT#0: int;
>   var j#0#AT#0: int;
>   var $Heap#AT#0: Heap;

In other words, tmp0.out contains 3 lines that tmp1.out does, and tmp2.out contains 4 lines that tmp1.out does not. The runs corresponding to tmp0.out and tmp1.out each reported the 2 verification errors shown above, whereas the run corresponding to tmp2.out reported the 4 verification errors shown above.

Pruning and using 1 core

With boogie d.bpl $(SMT_OPTIONS_ABOVE) -errorTrace:0 -prune, I consistently get

d.bpl(5595,11): Error: This loop invariant might not be maintained by the loop.
d.bpl(5601,11): Error: This loop invariant might not be maintained by the loop.

Boogie program verifier finished with 12 verified, 2 errors

If I also add -printPruned, I see no difference between different runs. Thus, it appears that -prune is deterministic with 1 core.

Using 2 cores and no pruning

Without pruning, I get no errors. In particular, running boogie d.bpl $(SMT_OPTIONS_ABOVE) -errorTrace:0 -vcsCores:2 consistently gives

Boogie program verifier finished with 13 verified, 0 errors

The fact that pruning makes a difference suggests there's also some other issue with pruning, unrelated to how many cores are being used.

keyboardDrummer commented 2 years ago

If I additionally use -printPruned to print the pruned files, then I get differences even when the number of errors is generated to be the same.

These differences seem to be only inside implementation files, inside which there's no pruning. I think this difference is only due to whether certain implementations already have or have not had dead variable elimination applied, and it should not effect the solver output.

keyboardDrummer commented 2 years ago

This doesn't seem to be caused by concurrency.

boogie d.bpl $(SMT_OPTIONS_ABOVE) -errorTrace:0 -prune -proc:"*Remove*"

returns the four errors:

d.bpl(6715,1): Error: A postcondition might not hold on this return path.
d.bpl(6556,3): Related location: This is the postcondition that might not hold.
d.bpl(6715,1): Error: A postcondition might not hold on this return path.
d.bpl(6571,3): Related location: This is the postcondition that might not hold.

while boogie d.bpl $(SMT_OPTIONS_ABOVE) -errorTrace:0 -prune

does not. Comparing the printPruning differences between the two of Impl$$_module.DoublyLinkedList.Remove doesn't show any differences besides ones in unrelated Implementations.

shazqadeer commented 2 years ago

There does not appear to be a conclusion or plan of action here. Perhaps we have to dig deeper.

@keyboardDrummer : Who is the person most familiar with the pruning code?

keyboardDrummer commented 2 years ago

There does not appear to be a conclusion or plan of action here. Perhaps we have to dig deeper.

Seems like there's a bug but nobody has spent the time needed to solve it, and I do think it needs some time.

@keyboardDrummer : Who is the person most familiar with the pruning code?

Me, but I have deprioritised solving this since it doesn't seem to be actively affecting anyone.

shazqadeer commented 2 years ago

For the test file I'm enclosing, it is also the case that running without pruning gives 0 verification errors (instead of 2 or 4).

This appears to be a serious error to me because it affects provability or completeness. I can imagine if pruning is not working as expected, this would create a really hard-to-debug issue.

@keyboardDrummer : Why do you say pruning is not actively affecting anyone? Is Dafny not using it actively? If not, then why was this feature added to Boogie?

keyboardDrummer commented 2 years ago

Is Dafny not using it actively?

Dafny uses it actively, but we're not currently running into an issue.

@shazqadeer I admire you tenacity for cleaning up issues in Boogie, but I can't really make time for this while it's not causing any pain. I think the discussion in this issue is useful if it comes up again in the future, but otherwise it might take quite some time before I'd pick it up. We could consider closing the issue and hope that if the discussion is useful it'll be found again.

shazqadeer commented 2 years ago

@RustanLeino :

The description of the issue suggests that two separate things are going on:

The first part about nondeterminism seems less important to me. I have often seen such behavior about different set of errors being reported.

I am more concerned about the second part because it appears that prune is supposed to guarantee that a successful verification should be preserved but it currently does not do that. Do you think that this could be a manifestation of the butterfly effect rather than some problem with pruning itself? Have you tried using the randomization feature without pruning on this file just to see whether you can recreate the failure?

shazqadeer commented 1 year ago

@RustanLeino : I played with the file d.bpl you attached to this issue. I can explain why the file fails to verify with pruning even though it verifies normally.

The answer lies partially in the description of the :include_dep attribute; see the long comment documenting the pruning feature in CommandLineOptions.cs. Your file is not using the :include_dep attribute; consequently, incoming edges into axioms are added only from triggers that occur in non-nested quantifiers in the axiom. Any incoming edges due to referenced declarations elsewhere in the axiom will not be present (unless the axiom is annotated with :include_dep).

shazqadeer commented 1 year ago

Upon reading the code in AxiomVisitor.VisitExpr, I detected some other potential problems:

It looks like the field pos to track polarity of an expression was added to Expr class just for this use case. This is inappropriate because pos is a temporary piece of tracking state. It should not be added to a core class such as Expr. Rather, we should attempt to track this state in AxiomVisitor itself.

The propagation of pos looks erroneous to me. For example, I see this code

n.Args[0].pos = Expr.NegatePosition(n.Args[0].pos); to update pos of the child when the parent n is a Not expression. I suspect the intention was to do

n.Args[0].pos = Expr.NegatePosition(n.pos); Otherwise, nested negation will not be handled correctly. There are similar problems in handling of other expressions also.

shazqadeer commented 1 year ago

I stumbled upon the helpful comment in CommandLineOptions.cs that explains the design of the pruning feature. Here are a few thoughts on the design.

The uses declaration seems odd to me for two reasons. First, it may happen that an axiom constrains several functions in which case I would like to add an edge from each of those functions to the axiom. With the current design, we would have to copy the axiom in the uses clause of each of the functions. Second, just as a matter of taste (and Boogie tradition), it is better to use attributes to introduce experimental features into Boogie. Later, when the feature is well understood, we can introduce custom ergonomic syntax for it. The uses clause could have been simulated by adding an attribute on axioms comprising names of functions/constants from which edges to the axiom must be added.

Given the rationale of easy migration, the design of the :include_dep attribute did not make sense to me. Migration concerns would suggest that we preserve existing behavior by default, which would be to do as little pruning as possible. With the current semantics of :include_dep, an unannotated axiom is likely to be dropped because of lack of incoming edges resulting in excessive pruning which would then have to be turned off by suitably adding :include_dep. The alternative of adding references from declarations mentioned in the axiom by default and changing the behavior based on suitable attribute on the axiom would be an alternative design.

keyboardDrummer commented 1 year ago

Second, just as a matter of taste (and Boogie tradition), it is better to use attributes to introduce experimental features into Boogie.

This syntax change was backwards compatible, just like adding an attribute would be. Is there a practical migration related difference between the two? I chose to introduce new syntax because it's prettier and everything else seemed equal.

Are you sure the Boogie tradition is not partly because it's cheaper to add new attributes than to add new syntax?

Given the rationale of easy migration, the design of the :include_dep attribute did not make sense to me. Migration concerns would suggest that we preserve existing behavior by default, which would be to do as little pruning as possible. With the current semantics of :include_dep, an unannotated axiom is likely to be dropped because of lack of incoming edges resulting in excessive pruning which would then have to be turned off by suitably adding :include_dep. The alternative of adding references from declarations mentioned in the axiom by default and changing the behavior based on suitable attribute on the axiom would be an alternative design.

Seems like in both solutions you end up adding an attribute to each axiom once, either {:include_dep} or {:exclude_dep}. I can imagine more advanced migration mechanisms but it's also good not to build too much upfront. I used the current migration feature for migrating Dafny to use /prune.

I stumbled upon the helpful comment in CommandLineOptions.cs that explains the design of the pruning feature. Here are a few thoughts on the design.

Sorry you had to find this information by stumbling. This explanation should be on a documentation page. Do we have a process for adding documentation?

keyboardDrummer commented 1 year ago

First, it may happen that an axiom constrains several functions in which case I would like to add an edge from each of those functions to the axiom. With the current design, we would have to copy the axiom in the uses clause of each of the functions.

An alternative would be having the axioms define a list of "used by", but in the case where the axiom is only used by one declaration the current syntax seems nicer and I believe that's all that was needed for Dafny.

shazqadeer commented 1 year ago

I wanted to understand the behavior of pruning on d.bpl without the :include_dep feature. So I conducted an experiment using the Boogie master branch on d.bpl.

I disabled :include_dep by changing the behavior of DependencyEvaluator.AddIncoming as follows:

protected void AddIncoming(Declaration newIncoming)
    {
      // if (QKeyValue.FindBoolAttribute(declaration.Attributes, "include_dep")) {
        incomingSets.Add(new[] { newIncoming });
      // }
    }

That is, I allow addition of incoming edges into axioms based on all referenced declarations. I believe this is tantamount to ignoring triggers completely and adding incoming edges based simply on referenced declarations, which is the simplest technique I can imagine for handling axioms.

I ran the modified Boogie on d.bpl with all the flags mentioned by @RustanLeino and also added -prune -printPruned:d. Verification was successful. So I compared the number of axioms in d.bpl to each of the pruned files dumped out by -printPruned. There are 540 axioms in d.bpl. The number of axioms in the dumped files range from 129 to 142. In other words, significant pruning is being achieved measured in terms of number of axioms pulled in for each verification condition. The running time with pruning is faster but even without pruning the file verifies quickly.

shazqadeer commented 1 year ago

Sorry you had to find this information by stumbling. This explanation should be on a documentation page. Do we have a process for adding documentation?

The closest thing to documentation for Boogie options is the implementation of /help in CommandLineOptions.cs. Search for /prune in the codebase to see the place where the prune option is documented.

shazqadeer commented 1 year ago

Are you sure the Boogie tradition is not partly because it's cheaper to add new attributes than to add new syntax?

Yes, convenience and prototyping speed are the reasons for the tradition.

shazqadeer commented 1 year ago

Seems like in both solutions you end up adding an attribute to each axiom once, either {:include_dep} or {:exclude_dep}.

I agree. A general pruning feature seems to require the addition of some attribute/syntax on axioms.

In the current design, there is syntax on both functions (uses clause) and axioms (include_dep). A design in which the extra syntax is unified at axioms may be possible using your suggestion of "used by". An axiom may override the default method of inferring incoming edges by including the specific sources of incoming edges in the "used by" declaration.

shazqadeer commented 1 year ago

I am including below the part of the comment in CommandLineOptions.cs that specifies how incoming edges into axioms are generated due to triggers mentioned in the axiom.

Apart from uses clauses and {:include_dep}, incoming edges are also created for axioms 
that contain triggers. If a quantifier with a trigger occurs in an axiom, then no incoming 
edges are determined from the body of the quantifier, regardless of {:include_dep} being 
present. However, for each trigger of the quantifier, each declaration referenced in the 
trigger has an outgoing edge to a merge node, that in turn has an outgoing edge to the 
axiom. The merge node is traversable in the reachability analysis only if each of its 
incoming edges has been reached.

I find this documentation to be ambiguous when I consider that an axiom could be an arbitrary Boolean expression with quantifiers (with or without triggers) occurring in arbitrary places. I would love to get clarification on exactly what is supposed to happen in general for an axiom. This understanding would help me in reviewing the code in AxiomVisitor.cs.

keyboardDrummer commented 1 year ago

Sorry you had to find this information by stumbling. This explanation should be on a documentation page. Do we have a process for adding documentation?

The closest thing to documentation for Boogie options is the implementation of /help in CommandLineOptions.cs. Search for /prune in the codebase to see the place where the prune option is documented.

I intentionally didn't add all the documentation there so as not to spam the command line help.

keyboardDrummer commented 1 year ago

A design in which the extra syntax is unified at axioms may be possible using your suggestion of "used by". An axiom may override the default method of inferring incoming edges by including the specific sources of incoming edges in the "used by" declaration.

Sounds nice indeed.

keyboardDrummer commented 1 year ago

I find this documentation to be ambiguous when I consider that an axiom could be an arbitrary Boolean expression with quantifiers (with or without triggers) occurring in arbitrary places. I would love to get clarification on exactly what is supposed to happen in general for an axiom. This understanding would help me in reviewing the code in AxiomVisitor.cs.

This comment from a while back might contain the information you're looking for: https://github.com/boogie-org/boogie/pull/427#issuecomment-898750866

shazqadeer commented 1 year ago

PR #765 and #767.