Find a way how to pass arbitrary parameters/configuration from the command line and/or from a file

johann-petrak commented 6 years ago

Currently we use argparse to pass on parameters but this needs the main program to know all the option names. We need a way to just pass on arbitrary key/value pairs from the command line and/or a config file.

In the main program this should get parsed into a dictionary which then gets passed around to all parts which may need configuration.

Maybe the easiest way to do this is to not use the command line and instead set it all in a yaml file. This has the added benefit that we can nest dictionaries and lists arbitrarily, allowing for more complex configs.

johann-petrak commented 6 years ago

In addition to YAML and JSON, TOML (https://github.com/toml-lang/toml) may be a useful alternative for us.

The easiest way to maybe also support command line use is by using the dotted nested dictionary convetion of key1.key2.key3 = value corresponding to { "key1": { "key2": { "key3": value } } } which could be passed as -Dkey.key2.key3=value or similar.

johann-petrak commented 6 years ago

With Python argparse there are two possible ways to pass on additional config settings:

use nargs='+' which allows something like -C key1=val1 key2=val2 and we get `["key1=val2", "key2=val2"] as the value of the argument, which we could then parse into a dictionary ourselves.
use nargs=argparse.REMAINDER which will collect all unknown arguments into a list that can be parsed by another instance of argparse (or ourselves).

The first method requires that we use a different parsing strategy ourselves, but has the advantage that it is easier to use the trick where arbitrary dot-structures keys are used, e.g. layer2.lstm.nhidden=200.

So overall, the best approach may be:

use argparse mainly for options that influence behavior of the command and possibly very important config
provide -C key1=value1 some.other.key=value2 ... for all actual config
provide --config filename.{yaml,toml,json} to set the actual config from a file (where command line still overrides the file which in turn overrides the defaults)

johann-petrak commented 5 years ago

This is especially important for modules like the TextClassCnnSingleElmo module so we can configure all the details about the Elmo model, the CNN model etc.

For this it would also be necessary to help module-sepcific --help options so once a module has been selected we can still query the option settings that module offers, e.g. --module MyModel --help-module

If we expect --help to do this automatically, then the top-level argparser needs to know which options set something that in turn will have their own options parser.

The bottom line:

we need something like argparser, but it should ignore dotted option names and instead pass them to the right python class/module
that argparser should also be able to delegate --help to one or all chosen modules/classes

The sequence of actions could be:

top level argparser only knows about the options of toplevel, but processes dotted options differently (maybe just put the string aside)
when a class that has options is initialised the args object is passed and the options for that class are processed. If the option of a class is identical to that of a parent caller, we could allow to inherit the parent setting as a default (depending on a argparse setting).
the final config datastructure could be a nested map/object with the entry pointing to "our" map and parent and child entries.
once a class has been initialized, the top argparse object contains a child entry and the map ofthe class contains a parent entry pointing to the top argparse object

johann-petrak commented 5 years ago

The easiest way to do this may be to use argparser.parse_known_args(artlist) which returns and args object and a list of unknown options. Nested classes/modules could then get the parent args and the list of unparsed args as their config and process the unparsed args in the same way, falling back to the parent args. Their sub classes would then get the args object parsed in the class, the unparsed options and also the args object parsed in the top. So we could generalise by representing the whole thing as a path of 0 to n args objects, followed by the options list.

johann-petrak commented 5 years ago

NOTE: make sure to turn off the default prefix matching for options, use argparse.ArgumentParser(..., allow_abbrev=False)

johann-petrak commented 5 years ago

Since parse_args can take a namespace object we could also parse everything into one huge global namespace. We could also follow a convention for setting the namespace values from a file for each component, e.g. --modulename.configfile file.yaml where the simplest approach would be to read in key-value pairs and convert them into a sequence of option/value elements for the standard argparse method.

johann-petrak commented 5 years ago

To use a dictionary instead of the argparse namespace use vars(namespaceobject)

johann-petrak commented 5 years ago

For now, try and see if the the configsimple package can be used for everything we need!

GateNLP / gate-lf-pytorch-json

Find a way how to pass arbitrary parameters/configuration from the command line and/or from a file #26