dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.85k stars 256 forks source link

measure-complexity non-deterministic? #5126

Open hmijail opened 4 months ago

hmijail commented 4 months ago

Dafny version

4.4

Code to produce this issue

No response

Command to run and resulting output

No response

What happened?

I ran the exact same command twice in a row, with no changes to source code: dafny measure-complexity --random-seed 1 --iterations 100 --log-format csv int.dfy.

One of the runs finished correctly, the other reported a verification error: int.dfy(738,40): Error: result of operation might violate newtype constraint for 'i256'

I guess the file doesn't matter; I'm just surprised because I expected that, given a random seed, the iterations' result would be the same. And this seemed to hold when I did small test runs with few iterations: the TestResult.ResourceCount values in the CSV were the same in each run.

So, are different runs with the same random seed expected to yield the same results?

By the way, I found this while trying to make sense of the verification coverage reports vs the resource usage variability. Given that measure-complexity is experimental, that dafny-reportgenerator seems disabled and untouched since Dafny 3.7, and projects like Dafny64, can you share whether there are any concrete plans for brittleness debugging?

What type of operating system are you experiencing the problem on?

Mac

keyboardDrummer commented 4 months ago

I'm just surprised because I expected that, given a random seed, the iterations' result would be the same.

Me too. This shouldn't happen, even if you do not specify --random-seed, since it should default to 0 then. @atomb thoughts?

Given that measure-complexity is experimental, that dafny-reportgenerator seems disabled and untouched since Dafny 3.7, and projects like Dafny64, can you share whether there are any concrete plans for brittleness debugging?

My intention is to move dafny-reportgenerator into measure-complexity. I would like it to, when passed a dafny project and no further arguments, to point at which Dafny verifiable symbols are brittle, according to the default values of the brittleness threshold parameters.

However, I can't comment on when exactly this change will be made.

atomb commented 4 months ago

I'm just surprised because I expected that, given a random seed, the iterations' result would be the same.

Me too. This shouldn't happen, even if you do not specify --random-seed, since it should default to 0 then. @atomb thoughts?

I also find it surprising. I've heard the hypothesis that Z3 treats the seed 0 specially, with the interpretation that the seed should really be derived from the clock. However, in looking at the source code of Z3 and experimenting with a seed of 0 on various real programs, I haven't been able to confirm that hypothesis. And, even if that were true, any other seed should result in truly deterministic behavior.

My first thought for debugging this is to run something like the command @hmijail posted above, twice, with the --solver-log flag, and then compare the (:random-seed ...) entries in the file. They should all be the same. If they're not, the non-determinism is coming from Dafny. If they are (and the files are otherwise the same, too), then it's coming from Z3.

One other thing that occurred to me is that it's possible, when specifying a time limit, that a given query could succeed or fail non-deterministically based on the load of other processes on the system. But the example above doesn't give a time limit.

hmijail commented 4 months ago

A while ago I tried multiple batches of --random-seed 0 --iteration 5 because, as @atomb mentioned and happens elsewhere, I thought that seed 0 might mean a randomly-inited-seed. However, the resource usage results were always the same across batches, so it clearly is an actual 0 seed. (maybe this could be mentioned in the docs)

Additionally, I instrumented Dafny and its "internal" Boogie to print out the sequence of random numbers generated and sent to the solver. In my tests of small batches of 5 iterations the generated numbers seemed to always be the same, even when setting --cores to 1 or 4.

Also, I tried using the equivalent --boogie options instead of dafny's random seed and iterations. Again I could get the rare verification failure in that way.

keyboardDrummer commented 4 months ago

@hmijail could you use --solver-log to see if the logs are different for two differently behaving executions?

Or otherwise, could you provide a reproduction, so I can try to do that?

hmijail commented 4 months ago

The solver logs are different even in non-failing verifications, so no idea of what exactly I am looking for.

The file that causes the problem is this: https://github.com/ConsenSys/evm-dafny/blob/1a89c03a5932441bba4f90aa1df669446fc7ae6d/src/dafny/util/int.dfy . I'm currently using this call; with about 20 repetitions you should get the error.

dafny measure-complexity --log-format csv --random-seed 1 --iterations 50 --cores 4 src/dafny/util/int.dfy

In case it helps, in a 200-iterations batch I got an even rarer error:

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Boogie.ReadOnlyVisitor.VisitVariable(Variable node)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitVariableSeq(List`1 variableSeq)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitImplementation(Implementation node)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitDeclarationList(List`1 declarationList)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitProgram(Program node)
   at Microsoft.Boogie.BasicTypeVisitor..ctor(Absy absy)
   at Microsoft.Boogie.Checker.Target(Program prog, ProverContext ctx, Split split)
   at VC.CheckerPool.PrepareChecker(Program program, Split split, Checker checker)
   at VC.CheckerPool.FindCheckerFor(ConditionGeneration vcgen, Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.DoWork(Int32 iteration, Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.DoWorkForMultipleIterations(Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.WorkUntilDone(CancellationToken cancellationToken)
   at VC.VCGen.VerifyImplementation(ImplementationRun run, VerifierCallback callback, CancellationToken cancellationToken, IObserver`1 batchCompletedObserver)
   at VC.ConditionGeneration.VerifyImplementation(ImplementationRun run, IObserver`1 batchCompletedObserver, CancellationToken cancellationToken)
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass50_0.<<VerifyImplementationWithoutCaching>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass41_0.<<VerifyEachImplementation>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass41_0.<<VerifyEachImplementation>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.VerifyEachImplementation(TextWriter output, ProcessedProgram processedProgram, PipelineStatistics stats, String programId, ErrorReporterDelegate er, String requestId, Implementation[] stablePrioritizedImpls)
   at Microsoft.Boogie.ExecutionEngine.InferAndVerify(TextWriter output, Program program, PipelineStatistics stats, String programId, ErrorReporterDelegate er, String requestId)
   at Microsoft.Dafny.DafnyMain.BoogiePipelineWithRerun(DafnyOptions options, TextWriter output, ExecutionEngine engine, Program program, String bplFileName, String programId)
   at Microsoft.Dafny.DafnyMain.BoogieOnce(DafnyOptions options, TextWriter output, ExecutionEngine engine, String baseFile, String moduleName, Program boogieProgram, String programId)
   at Microsoft.Dafny.CompilerDriver.BoogieOnceWithTimerAsync(TextWriter output, DafnyOptions options, String baseName, String programId, String moduleName, Program program)
   at Microsoft.Dafny.CompilerDriver.<>c__DisplayClass8_0.<<BoogieAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.CompilerDriver.<>c__DisplayClass8_0.<<BoogieAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.CompilerDriver.BoogieAsync(DafnyOptions options, String baseName, IEnumerable`1 boogiePrograms, String programId)
   at Microsoft.Dafny.CompilerDriver.ProcessFilesAsync(IReadOnlyList`1 dafnyFiles, ReadOnlyCollection`1 otherFileNames, DafnyOptions options, ProofDependencyManager depManager, Boolean lookForSnapshots, String programId)
   at Microsoft.Dafny.CompilerDriver.RunCompiler(DafnyOptions options)
   at Microsoft.Dafny.DafnyCli.<>c__DisplayClass9_0.<<SetHandlerUsingDafnyOptionsContinuation>g__Handle|0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.DafnyCli.<>c__DisplayClass20_0.<<AddDeveloperHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()
keyboardDrummer commented 4 months ago

Thanks, having a reproduction is great. We'll schedule diving into this.

In case it helps, in a 200-iterations batch I got an even rarer error:

That's a completely different bug that we've left unsolved for too long. We'll also schedule solving that.

hmijail commented 4 months ago

While we're at it, what would you think of adding some more direct control of the random number being used, so that one can jump back into a given position of the random stream?

My use case: currently I am generating big batches of verification results (e.g. 100 iterations) to look for the extreme values of Resource Usage. However, if a given batch does contain an interesting extreme, I have no way to reproduce the particular random number that created that extreme: even though I currently extract that number, I can't inject it back in, because from a PRNG result I can't get a seed that would generate it.

So the only way I have to find interesting cases is by running single iterations with new random seeds. But running 1 iteration, with ancillary startup costs, is about 5-10x slower than running 10 iterations.

This could be solved by e.g. making the seed the first "random" number to be used, and generating the next random number by hashing the previous one.

(please let me know if this should be a different issue)

hmijail commented 4 months ago

(If this change would be welcome I think I could contribute it myself in a PR)

keyboardDrummer commented 4 months ago

I think a user friendly way of reproducing a certain iteration, would be for each iteration to emit which --random-seed argument should be used to reproduce that iteration.

In Dafny master there's currently a bug with --iteration, and I think we need to fix that before it's possible to add the above feature. I've created an issue to do both: https://github.com/dafny-lang/dafny/issues/5134

hmijail commented 4 months ago

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

keyboardDrummer commented 4 months ago

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

I will update the iteration and random seed mechanism in this PR: https://github.com/dafny-lang/dafny/pull/5138

Let's wait for that one to land before we gather more data.

keyboardDrummer commented 4 months ago

While we're at it, what would you think of adding some more direct control of the random number being used, so that one can jump back into a given position of the random stream?

This should be implemented now through https://github.com/dafny-lang/dafny/pull/5138

Here's what it looks like:

rwillems@bcd0745419f2 evm-dafny % localDafny measure-complexity --random-seed 1 --iterations 100 --log-format csv src/dafny/util/int.dfy 
Starting verification of iteration 0 with seed 1
Starting verification of iteration 1 with seed 534011718
Starting verification of iteration 2 with seed 237820880
Starting verification of iteration 3 with seed 1002897798
Starting verification of iteration 4 with seed 1657007234

The random seed is also shown in the CSV file for each line.

I'm currently using this call; with about 20 repetitions you should get the error.

I tried that but I don't get the error on Dafny 4.4. However, it could be that I just need to try for longer. Would you mind if I do not debug this issue further, since the upcoming Dafny reworks how --iterations is processed in dafny measure-complexity.

Additionally, I instrumented Dafny and its "internal" Boogie to print out the sequence of random numbers generated and sent to the solver. In my tests of small batches of 5 iterations the generated numbers seemed to always be the same, even when setting --cores to 1 or 4.

Note that --solver-log doesn't work well with measure-complexity in the sense that it only outputs log files for one iterations (I'm guessing the last).

If I look at the logs of solver inputs, then the --random-seed value significantly changes what is sent to the solver, including the values of :smt.random_seed and :sat.random_seed

I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I see the same. @atomb would you mind helping debug that? It would be quite bad if the resource count values can not be relied on.

hmijail commented 4 months ago

Would you mind if I do not debug this issue further

Yes, of course. Surely you know better what's the general status and plan.

Regarding the different RUs in single-iterations vs multiple-iterations: actually beware, I might have been mis-measuring that. (I have been writing scripts to deal with the CSVs and I suspect the CSVs I was processing by then had a different row cardinality than I expected at the moment)

On the other hand, I just found yet another related thing: --random-seed X --iterations 1 causes a weird error that disappears with --random-seed X --iterations 2.

❯ dafny44 measure-complexity  --random-seed 2597730344 --iterations 1 /Users/mija/programming/ConsenSys/evm-dafny/src/dafny/core/precompiled.dfy
Prover error: line 120 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 313 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 314 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 121 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 291 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 292 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 958 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 959 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

^C
❯ dafny44 measure-complexity  --random-seed 2597730344 --iterations 2 /Users/mija/programming/ConsenSys/evm-dafny/src/dafny/core/precompiled.dfy

Dafny program verifier finished with 17 verified, 0 errors
keyboardDrummer commented 4 months ago

causes a weird error that disappears with

Is that with the released Dafny 4.4 ? Not with the nightly

hmijail commented 4 months ago

(my last message was sent by email and got corrupted, so I deleted it)

This is with released 4.4.

atomb commented 3 months ago

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

I've been experimenting with this using today's latest commit, and I'm finding that verify and measure-complexity --iterations 1 always give the same resource counts, but that measure-complexity --iterations 2 (or any higher number) give a slightly different resource count.

I looked at the generated SMT files, and the difference comes down to naming. With --iterations 1 the SMT query has names of the form $generated@@<n> but with --iterations 2 the query has names of the form random<n1>@@<n2>. I expect that making those names consistent across all usage modes should solve the problem.

keyboardDrummer commented 3 months ago

I expect that making those names consistent across all usage modes should solve the problem.

I agree, it would be good to do that to prevent confusion.