Discussion : best practice for handling (command line) options

martin-cs commented 7 years ago

This has come up in reviewing a number of PRs recently and I thought it might be good to have a documented consensus on this. To achieve one, we possibly need a discussion. So this is my understanding of the current state-of-affairs / how things are (but not necessarily how I, or anyone else) thinks they should be.

There has been a long-term goal of minimising the number of command-line options and making sure that the default options Do The Right Thing. Verification systems seems to grow vast numbers of options, for comparison, another system I have worked with has over 300 flags, another requires a sophisticated config file (it has it's own include system!) and comes with around 75 sample configuration. In these cases it is easy for configuration to become a skill in itself, creating a steep barrier to entry and meaning the majority of users do not use the majority of the functionality of the system. There is also the obvious QA and testing nightmare.
There have always been undocumented options. These are experimental features, features that only work / help in restricted / limited use-cases, features for specific customers, etc. This is not really an ideal state of affairs but it allows some functionality to be maintained centrally but only really used by some users.
The command line gets transferred to a cmdlinet object owned by the appropriate child of parse_options_baset (which contains the fantastic "doit" method, the entry point to our binaries). This is the unfiltered, lightly parsed command-line and shouldn't (there are exceptions but there probably shouldn't be new exceptions) really be used apart from parsing to options.
The parse_options methods should include code that checks cmdlinet, validates that the (combination of) flags make sense and sets values (including defaults, if applicable) in an optionst. This tends to be a local variable of a top-level method and is passed (as a const reference) to parts of the system that need config information. This should, preferably, but not necessarily exclusively be in the directory associated with the relevant binary.
Then there is configt. This holds details of the language and platform for which the code is being verified. It is auto-detected or loaded from the goto-program where possible but can be over-ridden from the command-line (and example of where a cmdline is used for something other than building options). This structure is global because many of these options are used all over the place in weird corners. It should only really contain options which are about the language / platform being analyzed.

So to add an option, one should:

A. Work out if it is needed and if so, which tool(s) needs it. Is there any sane way of automatically configuring this information?

B. Add it to the ($BINARY)_OPTIONS macro in $BINARY/$BINARY_parse_options.h. Things that are used by multiple tools should probably get a macro to reduce duplication. See OPT_GOTO_CHECK. If it is going to be fully / properly supported then add it to the help text at the end of $BINARY/$BINARY_parse_options.cpp , again, macros to reduce duplication.

C. If it is an option for what language version / platform we are emulating, then it should get the appropriate field in configt and be parsed from the command line by configt::set.

D. If not then code should be added to $BINARY/$BINARY_parse_options.cpp to recognise the flag and set the correct values in the optionst, including default values.

E. Ideally, somewhere in $BINARY/* , we should make use of the optionst entry to set-up the relevant objects. The API on objects should be sufficient to do this. If you find yourself threading individual arguments or const optionst & through a series of interfaces then there is probably something wrong and there is probably an easier way.

@kroening @tautschnig @peterschrammel Does this match your view of the intention behind the design.

@smowton You were running into issues with some of this in the CEGIS code; what should we improve?

Everyone : Questions, thoughts, opinions? Particularly backed with examples of things we have to do / have done.

smowton commented 7 years ago

Wasn't me. @pkesseli ? I did move a few option definitions e.g. to here (https://github.com/diffblue/cbmc/blob/develop/src/java_bytecode/java_bytecode_language.h#L24) to enable sharing them across drivers however.

pkesseli commented 7 years ago

Our "problem" in CEGIS was that we wanted to be able to use the same language front-end as CBMC and JBMC to run synthesis problems. We solved this by aggregating both cbmc_parse_optionst and jbmc_parse_optionst.

martin-cs commented 7 years ago

Thanks @smowton and @pkesseli , comments from @kroening @tautschnig and @peterschrammel ?

LAJW commented 7 years ago

I'd just like to leave this here:

Disclaimer: This is my opinion, If I am completely wrong, let me know.

I'd like to make a distinction between command line and function arguments (which include setters and constructor arguments).

Command line arguments

Command line arguments are a very special case in any program - they're not typed, have to be verified for different kinds of errors than function arguments.

There exists a way of handling command line options that achieves following goals:

All arguments are declared once. Description of an argument is forced
All arguments are listed with -h
All arguments are verified at the start of the program
Type safety
Sharing options definitions with other applications using the library
No macros
Automatic formatting of --help

Namely, this:

CLIOptions cli_parser;
cli_parser.option<bool>({ "property", "p" }, "Property description", false);
cli_parser.option<vector<int>>({ "array", "a" }, "Array description");
std::unorderd_map<string, any> options = cli_parser.parse(argv, argc);
// Obtaining a value
const auto array = any_cast<vector<int>>(map.at("array"));

First, a command line parser is configured. Then it's fed with argv argument array. The result of it, is an unordered_map of any type. Anyone that wants to add an option to the program, has to add it here. It will never be forgotten. Extra specific parsers (ranged comparisons, BigInt, etc.) can be supplied with lambdas. It can be even deserialised from a file.

This cli_options object however is not to be used for passing arguments to every class. It's to act as a bridge between CLI strings and strict C++ world.

Function arguments

The discussion was about function arguments for large classes that have to obtain some arguments from the command line. That's very specific. I'd like to generalise that problem.

There are lots of classes (and functions) in our codebase that accept 5-20 arguments (sometimes even more). Many classes exist for the sole purpose of slightly reducing amount of typing. These classes do not have any semantic meaning. They're just throwaway containers, for using this to refer to a group of questionably related data. The fact they're instantiated just once during the whole program just proves my point.

I'd argue that we should prefer using function arguments and return values. When number of arguments/return values exceeds 3, they should be packed into a struct, so as not to confuse the caller which argument means what.

The advantage of functionless structs is that they can be mixed and grouped locally. There would be no need for elaborate mechanisms of communication between classes and deep inheritance chains, as the tight coupling between a method and unrelated sinks in its neighbouring classes wouldn't exist.

Don't get me wrong, I like classes, when they represent specific objects - eg. expression, symbol, namespace, stream. What I dislike is the present "Object Oriented Assembler" where program is using globals slightly disguised in a single instance of a god class.

I'm not calling for a total rewrite of the universe, merely for a different state of mind going forward. For example, given a choice between adding new member to a class, or adding an argument, latter option should be chosen. If number of arguments is too large, they should be packed into a struct. That's pretty easy, right?

tautschnig commented 6 years ago

While I ack all that has been said here: what's the goal of this discussion? I believe @martin-cs has accurately captured the current state of affairs. @LAJW has described some general design principles. What's next?

martin-cs commented 6 years ago

Someone (probably me) documents that state of affairs.
We have space to discuss changing these things, which judging by the response, is not something that is widely desired.

I'll close the issue once I've done 1.

diffblue / cbmc

Discussion : best practice for handling (command line) options #1521

Command line arguments

Function arguments