Closed ivanperez-keera closed 2 weeks ago
I believe that this is a leftover from back before Copilot.Theorem.What4
was using its old heuristics for determining how to handle inductive-style proofs. In 4ae3f249fbf8c37c0909046150ff341d432bd042, Copilot.Theorem.What4
was re-written to use a k-induction–based heuristic, which is more clever and can handle more inductive proofs out of the box, including some of the Propositional
examples.
That being said, Copilot.Theorem.What4
is still using heuristics, and it's possible to fool the heuristics if you try hard enough. Here is a modification to Example 3
that will cause it to become invalid
instead of valid
.
diff --git a/copilot/examples/what4/Propositional.hs b/copilot/examples/what4/Propositional.hs
index 75dcf55f..05bc7da9 100644
--- a/copilot/examples/what4/Propositional.hs
+++ b/copilot/examples/what4/Propositional.hs
@@ -22,7 +22,7 @@ spec = do
-- An inductively defined flavor of true, which requires induction to prove,
-- and hence is found to be invalid by the SMT solver (since no inductive
-- hypothesis is made).
- let a = [True] ++ a
+ let a = [True] ++ ([True] ++ ([True] ++ a))
void $ prop "Example 3" (forAll a)
-- An inductively defined "a or not a" proposition, which is unprovable by
(Note that let a = [True] ++ ([True] ++ ([True] ++ a))
is the result of appending multiple streams where each stream has a history of length 1. It shouldn't be confused with let a = [True, True, True] ++ a
, which is a single stream with a history of length 3.)
Perhaps we should update the comments in Propositional
, and consider including the more complicated example above as something that the heuristics cannot handle out of the box?
let a = [True] ++ ([True] ++ ([True] ++ a))
is the result of appending multiple streams where each stream has a history of length 1. It shouldn't be confused withlet a = [True, True, True] ++ a
, which is a single stream with a history of length 3.
Should those be different?
The two streams are equivalent in terms of behavior, but Copilot stores each stream differently in its internal representation. (One could imagine an optimization that turns the let a = [True] ++ ([True] ++ ([True] ++ a))
into let a = [True, True, True] ++ a
, but Copilot doesn't currently perform such an optimization.)
Copilot.Theorem.What4
's heuristic is sensitive to this internal representation, as it uses the maximum history length of all streams in the specification to determine how to much work it needs to do an inductive proof. Generally speaking, this means that if you have streams with a longer history, the more work the heuristic will do (and the more likely it is that the proof will cover all of the necessary base cases).
Description
The example included in copilot/examples/what4/Propositional.hs
includes comments indicating the expectations for each of the statements that can/cannot be proven with Z3. Those comments are incorrect wrt. the current implementation.
Type
Additional context
None.
Requester
Method to check presence of bug
Running the example if copilot/examples/what4/Propositional.hs
produces an output that does not match the comments in the code. For example, the third example reads:
https://github.com/Copilot-Language/copilot/blob/068c06dd7ab6e900e2e8728ecb1c3b6e94ba9ccb/copilot/examples/what4/Propositional.hs#L22-L26
but, when running the file, the output is:
*Main> main
Example 1: valid
Example 2: invalid
Example 3: valid
Example 4: valid
Example 5: valid
Example 6: valid
Example 7: valid
Expected result
The comments in and the output of running the file copilot/examples/what4/Propositional.hs
match.
Desired result
The comments in and the output of running the file copilot/examples/what4/Propositional.hs
match.
Proposed solution
Modify examples so that they are consistent with output, potentially duplicating them so that basic cases, which are handled by copilot-theorem
, are shown separately from those that it cannot prove valid.
Further notes
Copilot.Theorem.What4
handles inductive-style proofs, rendering these comments incorrect.Change Manager: Confirmed that the issue exists.
Technical Lead: Confirmed that the issue should be addressed.
Technical Lead: Issue scheduled for fixing in Copilot 4.1.
Fix assigned to: @RyanGlScott .
Implementor: Solution implemented, review requested.
Change Manager: Verified that:
Solution is implemented:
FROM ubuntu:focal
ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update
RUN apt-get install --yes libz-dev RUN apt-get install --yes git
RUN apt-get install --yes wget RUN mkdir -p $HOME/.ghcup/bin RUN wget https://downloads.haskell.org/~ghcup/0.1.19.2/x86_64-linux-ghcup-0.1.19.2 -O $HOME/.ghcup/bin/ghcup
RUN chmod a+x $HOME/.ghcup/bin/ghcup ENV PATH=$PATH:/root/.ghcup/bin/ ENV PATH=$PATH:/root/.cabal/bin/ RUN apt-get install --yes curl RUN apt-get install --yes gcc g++ make libgmp3-dev RUN apt-get install --yes pkg-config RUN apt-get install --yes z3
SHELL ["/bin/bash", "-c"]
RUN ghcup install ghc 9.4 RUN ghcup install cabal 3.2 RUN ghcup set ghc 9.4.8 RUN cabal update
CMD git clone $REPO \ && cd $NAME \ && git checkout $COMMIT \ && cabal v1-sandbox init \ && cabal v1-install alex happy --constraint='happy<2' \ && cabal v1-install copilot**/ \ && cabal v1-exec -- runhaskell copilot/examples/what4/Propositional.hs
Command (substitute variables based on new path after merge):
$ docker run -e "REPO=https://github.com/GaloisInc/copilot-1" -e "NAME=copilot-1" -e "COMMIT=b71e159738b215bb35bf5fbd1e073d4d25c30d00" -it copilot-verify-535
Change Manager: Implementation ready to be merged.
The example in
Propositional
states that some of the definitions are invalid (cannot be proven), especially when defined by induction. However, that is not the case, at least not anymore. A run of that example, produces:The only one that is invalid is one that is
false
.