Closed robin-aws closed 2 years ago
Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.
Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit
? Is that because /timeLimit
doesn't work at the task level? Or because /timeLimit
sometimes just doesn't work?
Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.
Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing
timeLimit
? Is that because/timeLimit
doesn't work at the task level? Or because/timeLimit
sometimes just doesn't work?
I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?
Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.
Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing
timeLimit
? Is that because/timeLimit
doesn't work at the task level? Or because/timeLimit
sometimes just doesn't work?
I tried to address that in the README for the tool here (https://github.com/dafny-lang/dafny-reportgenerator): "This is better than setting a more aggressive verification cost bound through options like /timeLimit directly, as it allows users to know that their code is still correct, but still blocks code changes that are too expensive to verify and hence likely to break in the future."
I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?
Fair question, and especially with the limited functionality the tool is providing so far you could imagine it just being yet more options on the dafny
CLI. But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the dafny
tool. It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke dafny
repeatedly you could get different non-deterministic results (at least in terms of duration).
For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like --logger
(for test results) and --collect
(for things like testing coverage), and other tools are used to report on that data in various ways.
But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the
dafny
tool.
What would those features be? When I think of a feature like detecting proofs with a high verification time variability, that seems like something useful to have in the Boogie CLI.
It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke
dafny
repeatedly you could get different non-deterministic results (at least in terms of duration).
What do you mean by analyse in multiple ways? Do you mean take the measurements from different proof runs and apply different statistics to them? I imagine in the general case it would be fine to have Boogie apply a common statistic after which you wouldn't want to do any further analysis, which you'd use in your CI.
For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like
--logger
(for test results) and--collect
(for things like testing coverage), and other tools are used to report on that data in various ways.
I'm all for other tools using Dafny's output, but I think that any Dafny related features customers should use in their CI should be in the Dafny CLI.
This change adds a best practice we recommend for all Dafny projects: measuring the cost of each individual verification task and setting an upper bound in CI, to guard against future verification instability (see https://github.com/dafny-lang/dafny/issues/1582).
This is done with two steps:
/verificationLogger:csv
parameter when invokingdafny
, which outputs a CSV file with verification results.dafny-reportgenerator
tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.There are more details on the latter tool here: https://github.com/dafny-lang/dafny-reportgenerator
The good news is that (at least after #36), this codebase is in good shape! :) All tasks take less than 5 seconds across a few runs, so I've set 10 seconds as the upper bound for now. I would prefer to bound the resource count instead since that is a more predictable metric, but Dafny doesn't record the resource count when splitting happens until the upcoming 3.5 release.
Note that because the
/verificationLogger
options cause extra output, I've also changed the lit configuration to no longer assert the exact output ofdafny
when verifying everything. This will make the build more stable by not depending on the exact console output, as all of the code here is always expected to verify successfully.By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.