Command-line flag handling on plugin-granularity in parser

Ericsson / CodeCompass

CodeCompass is a software comprehension tool for large scale software written in C/C++ and Java

https://codecompass.net

GNU General Public License v3.0

512 stars 101 forks source link

Command-line flag handling on plugin-granularity in parser #570

Open bruntib opened 2 years ago

bruntib commented 2 years ago

There are some command-line options that are common among plugins and can also be given individually to specific plugins. For example --input can be given compilation database JSON files (that are handled by C++ plugin) and directories (handled by Search and Python plugins.)

Currently the plugins can provide a set of plugin-specific command-line flags, for example Search plugin has --search-skip-directory and C++ plugin has --skip-doccomment.

Currently there are two flags that could be both common and plugin-specific: --input and --search-skip-directory. I think we should introduce a flag format for this kind of parameters where the user can specify whether he wants to provide these as common or plugin-specific parameter. My suggestion is this:

# Every parser gets this.
CodeCompass_parser -i compile_commands.json

# Only cppparser plugin gets this.
CodeCompass_parser -i cppparser:compile_commands.json

# Both search parser and C++ parser get this, but not the others.
CodeCompass_parser --skip-path cppparser:searchparser:/path/to/directory

If we implement this then we should remove --search-skip-directory flag. Small side-note: we should make sure that plugin names don't contain : in their names so we can use the format above.

In this ticket we could discuss this approach. Do you have notes to this approach?

For example I can't see a meaningful use-case for providing some option to two plugins. I mean why would anyone skip the analysis of some sub-directory in C++ and Python parsers, but not in Search parser? Anyway, this format supports this if we'll need it some time in the future. The main goal of this format for now is that the user can choose whether only one or all plugins should get a specific parameter.

whisperity commented 2 years ago

I am strongly against convoluting the command-line interface by having the users remember complex grammars (which are also complex to parse and easy to break) just to achieve a purpose, and against the festering of weird constructs like it happened in CodeChecker...

Moreover, hardcoding the cppplugin: prefix would also mean that no input path containing this will ever work, even if the user would like to parse a wholly unrelated "cppplugin"-named construct.

My idea is, instead of the proposed:

--input should stay as it is, and everything supplied to it will be handled by every plugin.
The generic plugin loader interface should define, for each plugin that may receive an input, a plugin-specific --input-cpp and --input-search which arguments will only be passed (unioned with the overall input set) to the specific plugin.

This way, people reading, and writing, will immediately know what an argument means, because the semantics of the argument is encoded in the argument name. And no additional parsing of the user-specified value is needed, which would only complicate matters.

andocz commented 2 years ago

I think having the plugin name on the right side of the option name-value pair is counter-intuitive. Normally --foo bar means "set foo to bar", which means -i cppparser:compile_commands.json reads like "set input to cppparser's compile_commands.json" or "set input to cppparser and compile_commands.json", both of which are obviously wrong. So I prefer whisperity's syntax for the most part. However:

I think the plugin name should come first (cpp-input, not input-cpp). This is consistent with the convention we currently have (search-skip-directory, dummy-arg) and results in option names that flow better ("skip-directory-search" sounds like an option for skipping something called "directory search".)
This --<plugin_name>-<argument_name> syntax can lead to name conflicts. For example, we have an --incremental-threshold option right now, which prevents someone from creating a plugin called incremental with an option named threshold. To avoid this, we could reserve a prefix to only be used for passing plugin options, like --plugin-arg-, shortened as -P. E.g. -Pcpp-input compile_commands.json. This also makes it obvious at a glance which options in a command are plugin-specific.

mcserep commented 2 years ago

In my opinion it is already complicated to use the command-line interface of CodeCompass, often even the developers of the project has to look up the documentation about the various arguments. Further complicating it with any of the previous proposals will not help on this.

Maybe the command-line interface could support only the most important parameters, but for finer configuration, a config file could be used. In this case a sample config file could be easily distributed with the binaries, so the users would know about the possible options, their default value, etc.

andocz commented 2 years ago

I would argue that moving some of the configuration to config files would actually increase the complexity of using the CLI. Most of these arguments will likely change from project to project, so the user would have to keep littering the file system with config files created for different runs. Scripts invoking CodeCompass_parser with advanced arguments would also require an extra config file to run.

I'm not against having config files as an option, but I think every argument should be available through the CLI as well.

By the way, I think the difficulty of using the CLI is mostly a documentation problem. The --help screen could be improved by making it clear which arguments are mandatory, and adding an example invocation.

mcserep commented 2 years ago

The used configurations should also be persisted in the workspace directory anyway, even if given through the CLI. We already store some project information in the workspace directory in JSON files, it is not littering the file system, but persisting the configurations. It is also an inconvenience that upon reparsing a project I have to remember what configuration options were given on the CLI probably weeks or months ago.

I agree that the help screen could be improved and better documented. Maybe it is just for my preference, but browsing some obscure syntax in a very long list on how to pass some special parameters to CodeCompass_parser is against my taste 😄

andocz commented 2 years ago

Storing the config does sound like a good idea. What I meant was having to prepare a config file when setting up a new project with advanced args (this is before the project directory has been created) is inconvenient. But again, if CLI args remain usable as an alternative then I'm OK with it.

mcserep commented 2 years ago

In conclusion, implementing both approaches would be the best 😄

A typical solution is that CLI arguments override config-file defined arguments.

bruntib commented 2 years ago

Thank you, guys for your comments, I found them very useful. It was nice to read some concerns which I didn't consider earlier. Actually, I raised this ticket, because there were two issues in the not too far past which could be solved with such a feature.

One is the fact that unfortunately the language parsers need to select their own items from a compilation database JSON file (both cpp and java plugins get the same JSON files through --input flag). The other one is skipping files from parsing. While reviewing Python parser I saw that the author used a different terminology for skipping files (it was called "file exception"). It wouldn't be good if the different parsers would introduce different terminology for same things.

So these were my main motivations on the introduction of dynamic parts to some flags so they can be transferred directly to specific plugins. As far as I can see, the format of this is not finalized yet. Also, let me add two things that we probably know implicitly, but could be used during this discussion:

The set of transferable flags should be introduced globally (i.e. not the plugins should add them). So in case a plugin writer intends to introduce a new flag which (s)he thinks would be useful for other plugins too, then (s)he should convince the community about the introduction of this common flag and the other plugin maintainers about the new flag's usage.
We should check if it's technically possible to have dynamic flag names in the boost library if we decide to use this format. I mean we obviously wouldn't like to put all combinations of --cpp-input, --java-input, --cpp-java-input, etc. to the documentation.

andocz commented 2 years ago

It was mentioned in the last meeting that putting all the plugin-specific flags in the same "namespace" is a bad idea -- this can lead to name collisions if two plugins define a flag with the same name by mere chance. To solve this, every flag should be transferable to specific plugins, not just the globally introduced ones.