Open johann-petrak opened 6 years ago
In addition to YAML and JSON, TOML (https://github.com/toml-lang/toml) may be a useful alternative for us.
The easiest way to maybe also support command line use is by using the dotted nested dictionary convetion of key1.key2.key3 = value
corresponding to { "key1": { "key2": { "key3": value } } }
which could be passed as -Dkey.key2.key3=value
or similar.
With Python argparse there are two possible ways to pass on additional config settings:
nargs='+'
which allows something like -C key1=val1 key2=val2
and we get `["key1=val2", "key2=val2"] as the value of the argument, which we could then parse into a dictionary ourselves. nargs=argparse.REMAINDER
which will collect all unknown arguments into a list that can be parsed by another instance of argparse (or ourselves).The first method requires that we use a different parsing strategy ourselves, but has the advantage that it is easier to use the trick where arbitrary dot-structures keys are used, e.g. layer2.lstm.nhidden=200
.
So overall, the best approach may be:
-C key1=value1 some.other.key=value2 ...
for all actual config--config filename.{yaml,toml,json}
to set the actual config from a file (where command line still overrides the file which in turn overrides the defaults)This is especially important for modules like the TextClassCnnSingleElmo module so we can configure all the details about the Elmo model, the CNN model etc.
For this it would also be necessary to help module-sepcific --help
options so once a module has been selected we can still query the option settings that module offers, e.g. --module MyModel --help-module
If we expect --help
to do this automatically, then the top-level argparser needs to know which options set something that in turn will have their own options parser.
The bottom line:
The sequence of actions could be:
The easiest way to do this may be to use argparser.parse_known_args(artlist)
which returns and args object and a list of unknown options. Nested classes/modules could then get the parent args and the list of unparsed args as their config and process the unparsed args in the same way, falling back to the parent args. Their sub classes would then get the args object parsed in the class, the unparsed options and also the args object parsed in the top. So we could generalise by representing the whole thing as a path of 0 to n args objects, followed by the options list.
NOTE: make sure to turn off the default prefix matching for options, use argparse.ArgumentParser(..., allow_abbrev=False)
Since parse_args
can take a namespace object we could also parse everything into one huge global namespace. We could also follow a convention for setting the namespace values from a file for each component, e.g. --modulename.configfile file.yaml
where the simplest approach would be to read in key-value pairs and convert them into a sequence of option/value elements for the standard argparse method.
To use a dictionary instead of the argparse namespace use vars(namespaceobject)
For now, try and see if the the configsimple package can be used for everything we need!
Currently we use argparse to pass on parameters but this needs the main program to know all the option names. We need a way to just pass on arbitrary key/value pairs from the command line and/or a config file.
In the main program this should get parsed into a dictionary which then gets passed around to all parts which may need configuration.
Maybe the easiest way to do this is to not use the command line and instead set it all in a yaml file. This has the added benefit that we can nest dictionaries and lists arbitrarily, allowing for more complex configs.