GeoscienceAustralia / uncover-ml

Machine Learning system for Geoscience Australia uncover project
Apache License 2.0
30 stars 20 forks source link

Find a better solution for configuration #100

Open brenmous opened 4 years ago

brenmous commented 4 years ago

Currently UncoverML is controlled by a YAML that gets read into a Config object that has various key: value pairs set as attributes.

This object has gotten pretty complex and there's a lot of dependencies between the attributes. It's also the biggest cause of tests breaking - new attributes get added or attributes get modified and then in the code they no longer exist on the config object in certain execution paths where they were previously being read/checked. It would be great to streamline this.

YAML is also an issue. YAML is really easy to make mistakes with. It's very syntax sensitive and small typos can lead to confusing errors. It also makes it hard to verify that the user has provided the correct values for the desired workflow. And the biggest issue (in my opinion) is that parameter name typos aren't handled. The user might think they've provided an optional parameter to activate a feature, but the key is misspelled. So when parsing the YAML file (by looking up parameters based on keys) that parameter won't get set and the related processing won't occur, but if it doesn't cause any errors (often in the case of optional features/parameters) the user won't realise.

Another concern is that the Config object contains state - it owns the FeatureSet and TransformSet objects. These contain the paths to covariate data and covariate statistics that are used for applying transforms.

I've been considering the Python module route. That is, have a config.py module the user is expected to modify. The parameter names are baked in as attributes so there's no concern about parameter name typos. It also gets around a lot of YAML's annoying syntax issues. However I'm open to any solutions that keep things simple and solve the mentioned issues.

This is a laborious task as just about everything in UncoverML touches the Config object. It also means extracting the stateful FeatureSet and TransformSet.

bluetyson commented 4 years ago

Yes, this one is hard. As we have see, very easy to make mistakes even for the experienced. :)

bluetyson commented 3 years ago

A thought would be a gui...or website that makes config files, too, or at least the important skeletons, config.py things you can automate of course good, too