Closed ivan-gavran closed 1 year ago
Here is how to start experimenting with Quint and the model.
quint
tool by running npm i @informalsystems/quint -g
(more details here)After installing the quint tool, run the REPL by typing quint
to terminal.
Then you can play with the REPL by trying some basic expressions, e.g.
2+3
val x = Set(1,3,4)
x.powerset()
x.map(i => i*2)
As a first step, examine the model operators from within REPL.
Example commands to try out after running quint
within the formal_spec
folder:
.load nmt.qnt
import basics.*
import nmt_helpers.*
import nmt.*
val c_leaves = [{ value: ("data", 0), namespaceId: 0 }, { value: ("data", 1), namespaceId: 2 }, { value: ("data", 2), namespaceId: 2 }, { value: ("data", 3), namespaceId: 4 }]
val c_tree = BuildTree(c_leaves)
val proofZero = CreateProofNamespace(0, c_tree)
CreateProofNamespace(2, c_tree)
val c_root = c_tree.hashes.get(1)
verifyInclusionProof(proofZero, c_root, 0, [{ value: ("data", 0), namespaceId: 0 }])
MerkleRangeHash(proofZero, [{ value: ("data", 0), namespaceId: 0 }])
verifyInclusionProof(proofZero, c_root, 0, [{ value: ("data", 0), namespaceId: 2 }])
import nmtProofVerification.*
// runs the initialization action
init
// runs one step of the model
step
// runs another step of the model
step
After getting acquainted with all the operators, you can simulate the model behavior by running
quint run --main=nmtProofVerification --max-samples=1 --max-steps=100 nmt.qnt --invariant=verificationAlwaysCorrect
This command will run the module nmtProofVerification
, which iteratively generates data, creates a proof for a namespace and then verifies it. It will run 100 steps of the model and will check that the property verificationAlwaysCorrect
always holds. It will output the steps to the terminal. (For options, run quint run --help
)
quint run --main=nmtTest --max-samples=1 --max-steps=100 nmt.qnt --out-itf=ITF_files/out.itf.json
This command will simulate the module nmtTest
, which non-deterministically corrupts the generated proof and outputs the simulation steps to the file ITF_files/out.itf.json
, to be used in tests.
Once a test file is generated, it is read by simulation_test.go
, a regular go test. (A path to the json file that should be tested needs to be given - at the moment hardcoded in the code.)
Thank you for providing instructions on using Quint and running tests. I am currently reviewing the PR and will provide my comments as I go. However, I have an immediate question:
In the ITF_files
directory, there are two JSON files - panicissue.itf.json
and runTest.itf.json
. Can you please explain how they were generated and how they differ from the output generated by the following command:
quint run --main=nmtTest --max-samples=1 --max-steps=100 nmt.qnt --out-itf=ITF_files/out.itf.json
Another question:
What is the expected output when running TestFromITF
using the panicIssue.itf.json
file.
Thank you for providing instructions on using Quint and running tests. I am currently reviewing the PR and will provide my comments as I go. However, I have an immediate question: In the
ITF_files
directory, there are two JSON files -panicissue.itf.json
andrunTest.itf.json
. Can you please explain how they were generated and how they differ from the output generated by the following command:quint run --main=nmtTest --max-samples=1 --max-steps=100 nmt.qnt --out-itf=ITF_files/out.itf.json
Another question: What is the expected output when running
TestFromITF
using thepanicIssue.itf.json
file.
The runTest.itf.json
was created exactly by running the command that you cite (perhaps with more execution steps, not fully sure about that). Then, when I ran the test, I noticed it caused the panic. So I inspected which steps were the one causing the panic and then I removed all those into a separate file, panicIssue.itf.json
. Now, for panicIssue.itf.json
I expect the test to panic (until the problem is fixed).
Thanks @ivan-gavran for addressing my previous comments. After examining the current Quint model in greater detail and reviewing some of the examples available in the Quint repository on GitHub, there are certain aspects that still require further clarification. I have listed these aspects below and would appreciate your insights on them.
It is a precise description of the expected behavior, and yet it resides on a higher level of abstraction than the code.
It allows for test generation.
How does the process of generating test cases in the Quint model compare to generating them programmatically in Go? Specifically, the Quint model requires us 1) to write the code to generate the test vectors e.g., creating a tree 2) to specify the corruption scenarios, and develop necessary code to produce corrupt inputs. Can you please discuss the advantages of using the Quint model for test vector generation, and how it compares to generating test vectors programmatically in Go?
In the nmtTest
module of the nmt.qnt
file, the sequences of steps are generated randomly. This means that depending on the value passed in the --max-samples option and the randomness used to pick those step sequences, we may or may not cover all the corruption cases in the output test cases. I am interested to know your perspective on the confidence we can have in the test vectors generated by the Quint model and the coverage it provides. Additionally, do you anticipate that Quint testing will uncover more bugs than traditional Go testing? How do you compare the tests generated by Quint with the existing tests in the nmt repository?
Having specifications written in Quint makes it possible to change tests/specs quickly
Hi @staheri14 , thanks for the questions and for digging deeper into this PR. I am answering your question one-by-one below.
- Can you explain the level of abstraction provided by the Quint model? For instance, when looking at the CreateProofNamespace function (https://github.com/ivan-gavran/nmt/blob/c3cc6b7acba34c97a1a4d5e8fa4be1d355535c1e/formal_spec/nmt.qnt#L277 ), the code appears to be just as detailed and complex as the original Go version. It would very helpful if you please clarify how the Quint model abstracts away the implementation details?
You are right that the model version and Go version of the computation here are at a similar level of abstraction and complexity. This is because NMT is more about computation than communication or interleaving of messages. My quoted comment was referring to general benefits of using models.
Where we do gain on readability of NMT are the test cases: using the model, we treat test-cases explicitly as data (whereas in the original Go code they are intertwined with test execution).
- How does the process of generating test cases in the Quint model compare to generating them programmatically in Go? Specifically, the Quint model requires us 1) to write the code to generate the test vectors e.g., creating a tree 2) to specify the corruption scenarios, and develop necessary code to produce corrupt inputs. Can you please discuss the advantages of using the Quint model for test vector generation, and how it compares to generating test vectors programmatically in Go?
The current test generation (as present in this PR) is the most similar to the existing fuzz_test.go
.
With Quint, we could additionally generate particular high-level scenarios. (This is not present in the PR, sending a commit tomorrow to illustrate.) For instance: generate tests that fail, generate tests that fail for a particular reason, or generate tests that succeed. (In the currently existing tests, these cannot be generated programmatically, but are instead hand-written).
The same Quint model can thus be used for tests such as those in nmt_test.go
(except that those are explicitly hand-written) and tests such as those in fuzz_test.go
.
- In the
nmtTest
module of thenmt.qnt
file, the sequences of steps are generated randomly. This means that depending on the value passed in the --max-samples option and the randomness used to pick those step sequences, we may or may not cover all the corruption cases in the output test cases. I am interested to know your perspective on the confidence we can have in the test vectors generated by the Quint model and the coverage it provides. Additionally, do you anticipate that Quint testing will uncover more bugs than traditional Go testing? How do you compare the tests generated by Quint with the existing tests in the nmt repository?
This touches on the three modes I was describing: a random simulation, symbolic simulation, and full verification.
Random simulation would probably work equally well as the current fuzz_test.go
.
The other two modes (still to be added to Quint, by connecting it to the Apalache model checker) provide better coverage, by not depending on sheer luck to stumble upon an interesting specified(!) scenarios, but guaranteeing to find them if they exist.
I'd like to emphasise that Quint tests are really Go tests. The difference comes from separating the test-case generation and execution. In the "traditional testing approach", the test cases (such as this tabular sequence) are written by hand; in the "Quint testing", the test cases are generated by the tool. This makes it easier to generate a much larger number of test cases (depending on the state-space size, even exhausting all possible test cases up to a certain number of steps) and in that way making it more likely that a rare bug would be found.
Having specifications written in Quint makes it possible to change tests/specs quickly
- Could you provide an example to clarify this aspect? It seems that for every new feature, we must write code in both the Quint language and the original Go code. Would this dual coding process result in a quicker change process? Specifically, how does the use of the Quint language reduce the overall time and effort required to implement new features or changes to existing features in comparison to only using the Go code?
The point I wanted to make was that as the codebase evolves (e.g., parameters of a certain function are changed, or the logic changes making some test-cases behave differently etc.), we do not have to re-write a large number of manually written test cases. Instead, we could change the model and the go function which reads the generated test cases and executes them, and have again a huge number of test cases re-generated.
Finally, it is fair to say that Quint is a new technology so best testing practices are still being established and evaluated. What I believe to be the most useful feature of Quint is that it takes care of generating tests through different algorithms, at the same price of having to write a model + Go adapter. The overall effect should be that we can write tests much faster (since one never has to write by hand tabular tests and make sure not to forget a case).
Thanks @ivan-gavran for your thorough response!
Please see my followup questions below:
You are right that the model version and Go version of the computation here are at a similar level of abstraction and complexity. This is because NMT is more about computation than communication or interleaving of messages. My quoted comment was referring to general benefits of using models.
Where we do gain on readability of NMT are the test cases: using the model, we treat test-cases explicitly as data (whereas in the original Go code they are intertwined with test execution).
Thanks for clarifying your point. Based on your explanation, the Quint model for nmt simplifies and automates the process of generating test data, which has two primary benefits (echoing what you said essentially): 1) it allows for the identification of hidden corner cases that would otherwise be difficult to discover manually; and 2) it reduces the workload for reproducing test data and refactoring code when new features are added.
However, as the current state of the PR, the test data generation follows a random simulation, hence, coverage-wise (referring to the first point above) the generated test data is similar to ones generated in the fuzz_test.go
, right?
I would also appreciate it if you could clarify whether Quint can be used as a specification for nmt, I would be curious to learn more about how that would work. For example, could the quint model (compared to the current go implementation) be leveraged to facilitate the implementation of nmt in a different language? My understanding is that it cannot replace a formal english specification, and we should not rely on Quint as a replacement of specifications. Would like to know your point of view on this.
How does the process of generating test cases in the Quint model compare to generating them programmatically in Go? Specifically, the Quint model requires us 1) to write the code to generate the test vectors e.g., creating a tree 2) to specify the corruption scenarios, and develop necessary code to produce corrupt inputs. Can you please discuss the advantages of using the Quint model for test vector generation, and how it compares to generating test vectors programmatically in Go?
The current test generation (as present in this PR) is the most similar to the existing fuzz_test.go. With Quint, we could additionally generate particular high-level scenarios. (This is not present in the PR, sending a commit tomorrow to illustrate.) For instance: generate tests that fail, generate tests that fail for a particular reason, or generate tests that succeed. (In the currently existing tests, these cannot be generated programmatically, but are instead hand-written). The same Quint model can thus be used for tests such as those in nmt_test.go (except that those are explicitly hand-written) and tests such as those in fuzz_test.go.
I'm interested to know how to generate test data that intentionally fail for specific reasons using quint and how to incorporate that data into the tests of the nmt_test.go
file. I assume that with your upcoming commit, this process will become clearer. My understanding is that, as each test in the nmt_test.go
file requires data to be corrupted in a particular way, it seems that we need to add all those corruption scenarios to the quint model under the nmt_test
module. Then, using the go adapter, we can parse the JSON file containing the test data, examine the type of corruption in each case, and call the corresponding test function from nmt_test.go
(or alternatively make each test in the nmt_test.go
read the JSON file and finds data that is relevant to its test). Is this the intended workflow for using the resulting test data to replace the hardcoded test vectors in nmt_test.go?
Also, I think quint-generated test data are not meant for unit testing of the functions? For example, tests such as https://github.com/celestiaorg/nmt/blob/05017fc627d0cdb0aaba84293141f67010485648/hasher_test.go#L67 (although, I do not see any immediate reason that this would not be possible). Can you please elaborate on this?
In the nmtTest module of the nmt.qnt file, the sequences of steps are generated randomly. This means that depending on the value passed in the --max-samples option and the randomness used to pick those step sequences, we may or may not cover all the corruption cases in the output test cases. I am interested to know your perspective on the confidence we can have in the test vectors generated by the Quint model and the coverage it provides. Additionally, do you anticipate that Quint testing will uncover more bugs than traditional Go testing? How do you compare the tests generated by Quint with the existing tests in the nmt repository? This touches on the three modes I was describing: a random simulation, symbolic simulation, and full verification. Random simulation would probably work equally well as the current fuzz_test.go. The other two modes (still to be added to Quint, by connecting it to the Apalache model checker) provide better coverage, by not depending on sheer luck to stumble upon an interesting specified(!) scenarios, but guaranteeing to find them if they exist.
The point I wanted to make was that as the codebase evolves (e.g., parameters of a certain function are changed, or the logic changes making some test-cases behave differently etc.), we do not have to re-write a large number of manually written test cases. Instead, we could change the model and the go function which reads the generated test cases and executes them, and have again a huge number of test cases re-generated.
I see your point, thanks for clarification! Currently, the test data generated in this PR is using the random simulation, is that correct? How can we use the other two modes?
Merging #163 (b6c7c58) into master (6854976) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #163 +/- ##
=======================================
Coverage 95.39% 95.39%
=======================================
Files 5 5
Lines 565 565
=======================================
Hits 539 539
Misses 15 15
Partials 11 11
I like this discussion :)
[...] Based on your explanation, the Quint model for nmt simplifies and automates the process of generating test data, which has two primary benefits (echoing what you said essentially): 1) it allows for the identification of hidden corner cases that would otherwise be difficult to discover manually; and 2) it reduces the workload for reproducing test data and refactoring code when new features are added.
However, as the current state of the PR, the test data generation follows a random simulation, hence, coverage-wise (referring to the first point above) the generated test data is similar to ones generated in the
fuzz_test.go
, right?
That is correct!
I would also appreciate it if you could clarify whether Quint can be used as a specification for nmt, I would be curious to learn more about how that would work. For example, could the quint model (compared to the current go implementation) be leveraged to facilitate the implementation of nmt in a different language? My understanding is that it cannot replace a formal english specification, and we should not rely on Quint as a replacement of specifications. Would like to know your point of view on this.
I believe Quint could be used as a specification for nmt. The most promising approach would be to interleave English descriptions (for intuition) with Quint expressions (for precision).
For the NMT spec, the description of treating the ignoreMaxNamespace
flag (here) caused some confusion for me as a reader. If that part was instead replaced by a Quint function, that would make it clear and would allow me to inspect the spec for different inputs.
One example of such approach is an (unfinished) intro to the bank module - linking in particular descriptions of ViewKeeper
's functions.
More generally, I see Quint code as a replacement for pseudocode - and now you have specs with pseudocode you can simulate, automatically reason about, or connect to test cases.
I'm interested to know how to generate test data that intentionally fail for specific reasons using quint and how to incorporate that data into the tests of the
nmt_test.go
file. I assume that with your upcoming commit, this process will become clearer. My understanding is that, as each test in thenmt_test.go
file requires data to be corrupted in a particular way, it seems that we need to add all those corruption scenarios to the quint model under thenmt_test
module. Then, using the go adapter, we can parse the JSON file containing the test data, examine the type of corruption in each case, and call the corresponding test function fromnmt_test.go
(or alternatively make each test in thenmt_test.go
read the JSON file and finds data that is relevant to its test). Is this the intended workflow for using the resulting test data to replace the hardcoded test vectors in nmt_test.go?Also, I think quint-generated test data are not meant for unit testing of the functions? For example, tests such as
https://github.com/celestiaorg/nmt/blob/05017fc627d0cdb0aaba84293141f67010485648/hasher_test.go#L67
(although, I do not see any immediate reason that this would not be possible). Can you please elaborate on this?
In the latest commit, I wrote the spec and the Go adapter that mimic the tests from TestNamespacedMerkleTree_ProveNamespace_Ranges_And_Verify
.
You will notice that the way in which I generate different scenarios (following the scenarios from the original test) is by defining an invariant. For instance, for the test case described by
{
"three leaves and not found but with range", 2,
append(repeat(generateLeafData(2, 0, 1, []byte("_data")), 2), newNamespaceDataPair([]byte{1, 1}, []byte("_data"))),
[]byte{0, 1},
2, 3,
false,
},
the invariant is defined by
val fiveLeavesAndNotFoundButWithinRange =
val leavesSet = listToSet(leaves_namespace_idx_v)
and{
length(leaves_namespace_idx_v) == 5,
not(leavesSet.contains(namespace_v)),
min(leavesSet) < namespace_v,
max(leavesSet) > namespace_v
}
The difference between the original test description and Quint description is that the original description is simultaneously describing a scenario and technically achieving it. In the commit, the Quint part only describes what a test scenario should be about, while the technical details (e.g., byte representation) are left to the adapter and exact data values are generated (and stored in a .json
file, to be consumed by the adapter).
After negating the invariant, a test case is found as a violation of that (negated) invariant.
This example illustrates that one does not have to explicitly write desired corruption cases. What we do instead is describe the data by an invariant.
You are right that Quint is not meant primarily for units tests, but can be used for it, too (as the example illustrates).
The point I wanted to make was that as the codebase evolves (e.g., parameters of a certain function are changed, or the logic changes making some test-cases behave differently etc.), we do not have to re-write a large number of manually written test cases. Instead, we could change the model and the go function which reads the generated test cases and executes them, and have again a huge number of test cases re-generated.
I see your point, thanks for clarification! Currently, the test data generated in this PR is using the random simulation, is that correct? How can we use the other two modes?
Yes, the test data is generated using random simulation. The other two modes are in the coming soon stage: they depend on connecting Quint to the Apalache model checker (here is the tracking issue)
In the previous comment, I mentioned the new commit, which mimics tests from TestNamespacedMerkleTree_ProveNamespace_Ranges_And_Verify
, but written using Quint. I'd like to add some more details about how that is done:
nmt.qnt
, here I do not generate all leaves at once, but generate one leaf per step. (An approach in which I would generate all at once is also possible. This touches on the @staheri14 comment on using Quint for unit tests).go
file given by a description and exact values with:
{
"two leaves and found", 1,
append(generateLeafData(1, 0, 1, []byte("_data")), generateLeafData(1, 1, 2, []byte("_data"))...),
[]byte{1},
1, 2,
true,
}
, Quint defines it by
val twoLeavesAndFound =
and{
length(leaves_namespace_idx_v) == 2,
val leavesSet = listToSet(leaves_namespace_idx_v)
leavesSet.contains(namespace_v)
}
Finally, we negate the property and use the negated property as an invariant, whose violation is then a test case.
Concreately, to obtain the test case data, we can run quint run --max-samples=1000 --max-steps=10 tests.qnt --invariant=twoLeavesAndFoundTest
scenario.json
, is manually stitched together. (Generation of multiple test cases for multiple properties is in the pipeline.)Yes, the test data is generated using random simulation. The other two modes are in the coming soon stage: they depend on connecting Quint to the Apalache model checker (https://github.com/informalsystems/quint/issues/12 is the tracking issue)
Would the current quint model for nmt be applicable for the other two simulation modes or would a different quint model be required for those modes?
Would the current quint model for nmt be applicable for the other two simulation modes or would a different quint model be required for those modes?
The same model can be used for all three modes. The difference is only in how they are looking for an invariant violation (and, therefore, what guarantees they can provide):
n
, it will make sure that all possible combinations (of different actions and different values) are explored and that if no violation is found, then there is not violation (of length up to n
).In this commit I corrected a mistake that was introduced while prettifying the spec, which resulted in generation of empty test-data.
I also added comments around Quint functions and did some go linting. Marking now this PR as ready.
@ivan-gavran Thanks a lot for addressing the comments, they look good to me! The issue with gobencher is currently being looked into. As soon as it's resolved, I'll let you know. At that point, I'll kindly request you merge the master branch into your branch one last time. After that, we should be ready to merge your PR :)
@ivan-gavran Would you mind updating your branch with the master branch for the last time, then we are good to merge this PR (given that CIs are fine excluding the gobencher). Thanks!
Overview
This PR introduces a formal specification of the NMT proof/verification logic in Quint.
The benefits of having a Quint specification are threefold:
verificationAlwaysCorrect
is an example of a property against which a specification can be checked.nmtTest
iteratively generates proofs and non-deterministically corrupts them. These generated test runs are exported in a json format (e.g., file ITF_traces/runTest.itf.json). To be executed as a part of the standard suite, an adapter simulation_test.go is necessary. (The adapter iterates through the json-represented execution state and translates them to function calls.) The generation of the tests happens through simulation. In that sense, it is similar to the existing tests fuzz_test.go, except that it also adds corruption of the data.Current limitations:
ignoreMaxNamespace
option). Modelling it correctly depends on the decision of what the desired behaviour is (issue #148 )Checklist