GateNLP / gate-lf-pytorch-json

PyTorch wrapper for the LearningFramework GATE plugin
Apache License 2.0
1 stars 2 forks source link

Find a way how to pass arbitrary parameters/configuration from the command line and/or from a file #26

Open johann-petrak opened 6 years ago

johann-petrak commented 6 years ago

Currently we use argparse to pass on parameters but this needs the main program to know all the option names. We need a way to just pass on arbitrary key/value pairs from the command line and/or a config file.

In the main program this should get parsed into a dictionary which then gets passed around to all parts which may need configuration.

Maybe the easiest way to do this is to not use the command line and instead set it all in a yaml file. This has the added benefit that we can nest dictionaries and lists arbitrarily, allowing for more complex configs.

johann-petrak commented 6 years ago

In addition to YAML and JSON, TOML (https://github.com/toml-lang/toml) may be a useful alternative for us.

The easiest way to maybe also support command line use is by using the dotted nested dictionary convetion of key1.key2.key3 = value corresponding to { "key1": { "key2": { "key3": value } } } which could be passed as -Dkey.key2.key3=value or similar.

johann-petrak commented 6 years ago

With Python argparse there are two possible ways to pass on additional config settings:

The first method requires that we use a different parsing strategy ourselves, but has the advantage that it is easier to use the trick where arbitrary dot-structures keys are used, e.g. layer2.lstm.nhidden=200.

So overall, the best approach may be:

johann-petrak commented 5 years ago

This is especially important for modules like the TextClassCnnSingleElmo module so we can configure all the details about the Elmo model, the CNN model etc.

For this it would also be necessary to help module-sepcific --help options so once a module has been selected we can still query the option settings that module offers, e.g. --module MyModel --help-module

If we expect --help to do this automatically, then the top-level argparser needs to know which options set something that in turn will have their own options parser.

The bottom line:

The sequence of actions could be:

johann-petrak commented 5 years ago

The easiest way to do this may be to use argparser.parse_known_args(artlist) which returns and args object and a list of unknown options. Nested classes/modules could then get the parent args and the list of unparsed args as their config and process the unparsed args in the same way, falling back to the parent args. Their sub classes would then get the args object parsed in the class, the unparsed options and also the args object parsed in the top. So we could generalise by representing the whole thing as a path of 0 to n args objects, followed by the options list.

johann-petrak commented 5 years ago

NOTE: make sure to turn off the default prefix matching for options, use argparse.ArgumentParser(..., allow_abbrev=False)

johann-petrak commented 5 years ago

Since parse_args can take a namespace object we could also parse everything into one huge global namespace. We could also follow a convention for setting the namespace values from a file for each component, e.g. --modulename.configfile file.yaml where the simplest approach would be to read in key-value pairs and convert them into a sequence of option/value elements for the standard argparse method.

johann-petrak commented 5 years ago

To use a dictionary instead of the argparse namespace use vars(namespaceobject)

johann-petrak commented 5 years ago

For now, try and see if the the configsimple package can be used for everything we need!