dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.92k stars 262 forks source link

Bring new CLI options up to par with legacy options #3468

Closed atomb closed 8 months ago

atomb commented 1 year ago

Summary

Currently, the new top-level commands for Dafny provide most but not all of the frequently-used features of the tool. This issue is to track remaining options to fill in before 4.0.

Background and Motivation

Particularly when working on proof debugging and solver performance investigation, several of the most useful options are missing.

Proposed Feature

DRC: Anything for verify needs to be available on translate/build/run in case those verify DRC: Others to consider: --random-seed --help-attr --deprecation something like /noCheating:0 to allow mix of assert and expect and assume something to easily /extractCounterexample

I fear if we don't supply what users need to tinker with proof performance, they may just not use the new CLI. So maybe also --legacy=" ... string of old options in old format " just in case

Alternatives

No response

davidcok commented 1 year ago

My add: --use-basename-for-filename should not be jsut a developer option -- I would find it essential for testing as a user.

Also --error-limit, --random-seed

keyboardDrummer commented 1 year ago

/vcsLoad is already covered by --cores

I would suggest --solver-log to be a hidden option

/trace is poorly defined. The documentation reads "blurt out various debug trace information". We should think about what it's used for and how to expose that with a good UI. It also seems quite similar to --log-format

What does /proverDll add to the already existing --solver-path ?

What's the use-case of error-limit?

How do you feel about letting customers use measure-complexity for the options you identified that aren't in verify, --log-format, --log-file, --isolate-assertions, so measure-complexity would be for performance debugging?

davidcok commented 1 year ago

It appears to me that there is a whole set of options whose point is to print out various kinds of debug information (/trace among them). They could be combined into a hidden option --debug.

atomb commented 1 year ago

Oh, does --cores allow a fraction to use that fraction of the available cores? I didn't realize that.

It's true that /trace is poorly-defined, but it's very useful as a quick mechanism to see how things are progressing. I wouldn't use it in an automated build, typically, or write anything automated that depended on its contents. but very frequently use it when running things during development. I use it so often that I'd probably never use dafny verify if it wasn't available. I also tend to use it in addition to --log-format. But I am happy to have it as a hidden option, and perhaps as an argument to --debug, as @davidcok suggests.

We could make it so that, if --solver-path points to a plugin DLL, rather than an executable file, that it would load it as a plugin. I kind of like that approach. But we'd need to add code to make that work. I'm torn about --solver-log. I wouldn't strongly oppose making it hidden, but enough people would probably use it and want to learn about it from the help that I'd slightly prefer it not to be hidden (unless maybe we have a --show-hidden or something that gives you descriptions for all the options).

For error-limit: the default error limit is 5, and sometimes you want to see all of the errors if there are more than that. At the very least, some of the tests depend on being able to report more than 5 errors per definition.

I think I'd prefer to have --log-format, --log-file and --isolate-assertions in both verify and measure-complexity, though they are more strictly necessary in measure-complexity.

davidcok commented 1 year ago

Aaron - are you using /trace to show progress? I've though before that on large projects I'd want a --progress option to show progress through the files and methods as they are verified.

atomb commented 1 year ago

Yeah, that's generally what I'm using it for, along with frequently wanting to see time and resource use statistics as they go by. Ultimately, what we might want is a --progress option in Dafny that's similar to, but perhaps not identical to, /trace, with a more focused purpose. In the meantime, --debug trace, or something like that, which enables exactly what /trace does, would be nice.

keyboardDrummer commented 1 year ago

--cores=50% will use half of the available cores.

I don't understand /proverDll. If you want to supply some custom C# code to use as a solver, then wouldn't you use --plugin=<assemblyPath>? Why would you also need to use /proverDll ?

Maybe /trace could be --verbosity=debug ?

Could you elaborate on the use-case of --log-format? What information does that give when you're doing dafny verify that you wouldn't get from /trace ?

For error-limit: the default error limit is 5, and sometimes you want to see all of the errors if there are more than that. At the very least, some of the tests depend on being able to report more than 5 errors per definition.

👍

atomb commented 1 year ago

Boogie has a separate plugin mechanism, different from the Dafny plugin mechanism, which it specifically uses for solver plugins. It would be possible, however, for Dafny to introspect a plugin to figure out what kind it is, which I think could be a better user experience. We could support both with --plugin.

The /verificationLogger:csv (--log-format csv) produces (would produce) a machine-readable CSV file containing all the statistics about verification time and resource used. It's what dafny-reportgenerator uses as input. That information is mostly available in /trace, but in a hard-to parse (and sometimes impossible to parse) form.

The /verificationLogger:text (--log-format text) option provides (would provide) both the outcome of each proof and the list of all assertions contained within each proof. It's particularly useful along with /vcsSplitOnEveryAssert (--isolate-assertions).

I like the idea of using --verbosity for /trace. I was just talking with @davidcok about adding modes to make Boogie quiet (only errors and warnings, no "Dafny program verifier finished with ..." message) or silent (no output at all, except for a return code), and having these be --verbosity arguments in Dafny, too, seems like a good idea. I think I might call the three modes trace, quiet, and silent, reserving debug for a (future) mode with additional debug output, although I'm not strongly attached to that.

keyboardDrummer commented 1 year ago

Boogie has a separate plugin mechanism, different from the Dafny plugin mechanism, which it specifically uses for solver plugins. It would be possible, however, for Dafny to introspect a plugin to figure out what kind it is, which I think could be a better user experience. We could support both with --plugin.

Ah, I was thinking Dafny users could use a Dafny plugin to configure a different solver back-end, but I guess you're saying Dafny users should be able to use Boogie plugins as well, so they don't need two separate .dlls. While this sounds like a nice UX, it also sounds a little complicated to me. If a Boogie plugin was built against a particular Boogie version, how do you know whether it works with a particular Dafny version?

That said, I'm OK with having a different option, like --boogie-plugin, for supporting plugins with a Boogie plugin format.

keyboardDrummer commented 1 year ago

The /verificationLogger:csv (--log-format csv) produces (would produce) a machine-readable CSV file containing all the statistics about verification time and resource used. It's what dafny-reportgenerator uses as input. That information is mostly available in /trace, but in a hard-to parse (and sometimes impossible to parse) form.

I'm skeptical about providing an 'API' through the CLI that is not meant for user consumption, because I feel it clutters the CLI and there are good alternatives. If customers want their software to integration with Dafny, they can consume one of Dafny's .NET packages and integrate with a C# API. Alternatively, the language server has verification related APIs as well. We could consider providing an api command that provides single fire uses of the language server API.

One thing is that measure-complexity is still marked as experimental, while verify is not. How about we first evolve the API before we decide what to put into verify? For example, I think we're still planning on moving the functionality of dafny-reportgenerator into measure-complexity. Would that effect the need for the CSV format?

Regarding the CSV format, things are grouped by assertion batches, but it's not clear which assertions are in each batch, so I don't understand what a user can do to evaluate these batches. To me it seems like assertion batches can only be defined by the assertions in them. If you don't provide that definition, it seems better to group by named verifiable (method/function/constant/datatype), so the user doesn't have to aggregate the information for each batch in a verifiable.

The /verificationLogger:text (--log-format text) option provides (would provide) both the outcome of each proof and the list of all assertions contained within each proof. It's particularly useful along with /vcsSplitOnEveryAssert (--isolate-assertions).

Regarding the text format, one of the tests (https://github.com/dafny-lang/dafny/blob/master/Test/logger/TextLogger.dfy#L14) produces an output where many of the verification batches have no assertion in them. What does that mean? There are also many repeated occurrences of the same batch, which is because multiple iterations have been specified, but it would be nice if the iteration number is shown so it's clear why there are multiple occurrences.

atomb commented 1 year ago

The /verificationLogger:csv (--log-format csv) produces (would produce) a machine-readable CSV file containing all the statistics about verification time and resource used. It's what dafny-reportgenerator uses as input. That information is mostly available in /trace, but in a hard-to parse (and sometimes impossible to parse) form.

I'm skeptical about providing an 'API' through the CLI that is not meant for user consumption, because I feel it clutters the CLI and there are good alternatives. If customers want their software to integration with Dafny, they can consume one of Dafny's .NET packages and integrate with a C# API. Alternatively, the language server has verification related APIs as well. We could consider providing an api command that provides single fire uses of the language server API.

Textual formats are a lot more portable than programmatic APIs. Using Dafny as a library is only really feasible from another .NET application. CSV files are supported by a nearly endless range of tools. I sometimes load the output of this analysis into a spreadsheet, for example, and it can be used with tools like csvkit.

One thing is that measure-complexity is still marked as experimental, while verify is not. How about we first evolve the API before we decide what to put into verify? For example, I think we're still planning on moving the functionality of dafny-reportgenerator into measure-complexity. Would that effect the need for the CSV format?

I don't think it would remove the need for CSV output. To functionality we provide, whether in an external executable or an internal command, will always be somewhat limited. If anyone wants to query the data in a way that we haven't implemented, it's nice for that to be an easy process.

More significantly, though, we're actively using all of this functionality in practice. If we don't make these options available in the new CLI, I don't think even a single one of my regular uses Dafny could be migrated away from using the old-style options.

Regarding the CSV format, things are grouped by assertion batches, but it's not clear which assertions are in each batch, so I don't understand what a user can do to evaluate these batches. To me it seems like assertion batches can only be defined by the assertions in them. If you don't provide that definition, it seems better to group by named verifiable (method/function/constant/datatype), so the user doesn't have to aggregate the information for each batch in a verifiable.

This format exists primarily for performance analysis, and therefore I think it makes sense to group things according to the unit for which performance data is available, which is at the level of SMT queries.

The /verificationLogger:text (--log-format text) option provides (would provide) both the outcome of each proof and the list of all assertions contained within each proof. It's particularly useful along with /vcsSplitOnEveryAssert (--isolate-assertions).

Regarding the text format, one of the tests (https://github.com/dafny-lang/dafny/blob/master/Test/logger/TextLogger.dfy#L14) produces an output where many of the verification batches have no assertion in them. What does that mean? There are also many repeated occurrences of the same batch, which is because multiple iterations have been specified, but it would be nice if the iteration number is shown so it's clear why there are multiple occurrences.

Regarding the first issue, the implementation of splitting that's currently in place frequently (or maybe always?) generates a split that has no assertions in it, and therefore that shows up in the verification results. I think we should fix that, for a number of reasons, but it seems to me that it's somewhat separate from the question of adding these command-line options.

Thanks for pointing out the lack of iteration number in the text output. That's something we should fix.

keyboardDrummer commented 1 year ago

Textual formats are a lot more portable than programmatic APIs. Using Dafny as a library is only really feasible from another .NET application.

Yes, but I do wonder whether it would be better to expose output that is optimised for machine but not human consumption, like JSON for example, or the assertion batch grouped CSV, through a separate api command, and whether to give that the same structure as the language server API to prevent duplication, but thinking more about it I'm not so sure. I'll drop this point : )

This [CSV] format exists primarily for performance analysis, and therefore I think it makes sense to group things according to the unit for which performance data is available, which is at the level of SMT queries.

That sounds sensible, but what is a user supposed to do with the batch grouping, not knowing which assertions belong to a particular batch? One thing they can do is aggregate the batches per named thing (symbol), and then try to improve the performance of one those symbols. Is there another thing they can do?

I guess the report generator determines a standard deviation per assertion batch, so then the grouping per batch makes sense.

However, you could also expose the lowest level information: have a row for each assertion, and a column that mentions the batch number. Assertions belonging to the same batch will be in consecutive rows that contain the same data except for the assertion's source location and possibly a code snippet.

To me it seems (too) difficult to design an output format that will be used by downstream tools, without knowing what those tools will be. Currently we have the report-generator as a downstream tool but we're planning to replace that (right?).

I think we should fix that, for a number of reasons, but it seems to me that it's somewhat separate from the question of adding these command-line options.

Sounds good. Just wanted to point it out since I looked at the output of the options. Another thing I would consider is to add a 'code snippet' for each assertion, in addition to the description that's already there, like x / 3 in addition to 'divisor is always non-zero'.

Also, it seems like that users, when reading this output, would have to switch back and forth between their editor and the output a lot. It might be interesting to define a problem matcher for the Dafny IDE that works in conjunction with measure-complexity, so users can measure the complexity of their proofs while in the IDE, and see the output of that measurement inline in their code.

More significantly, though, we're actively using all of this functionality in practice. If we don't make these options available in the new CLI, I don't think even a single one of my regular uses Dafny could be migrated away from using the old-style options.

These options are already available in measure-complexity, right? When using them to get data for the report-generator, using measure-complexity seems best anyways.

atomb commented 8 months ago

With #5150, #5185, #5186, I think we'll have most of what we need. I'm going to close this issue now that we have more fine-grained tracking in those issues.