Closed brenmous closed 4 years ago
Yes. I also discovered that the .model contains detailed config file info when I wrote the metadata profiling module script. The design is problematic and needs to be reviewed.
I think the best solution is to pass the config file as an argument to predict
, and from the output directory parameter (or some new parameter specific for the model) it can locate the trained model.
This cuts down on the chance of a user providing a model but a modified config with different algorithm etc (I assume this was the reason for the original design) but still allows the config to be parsed individually at each step.
Need to figure out the dependencies of .model on config params. The model can be used with certain constraints such as you said algorithm, and what else?
Looking at the top level script:
mask
: this is read from the config but can also be provided from the command lineretain
: same as abovename
: gets used in naming output filesalgorithm
: gets used in naming output filesoutbands
: bands to write output maps forThere's possibly other dependencies on the config. My concern is that saving the config in this manner has been done to preserve state - but if so much state needs to be preserved then learn
and predict
shouldn't be separate steps. We need to work on not only fixing this config issue but also the fundamental problem of predict
relying on state that isn't included in the trained model.
Working on this now - solution is to add a specific model
field under the output
block. This will specify where it gets saved as part of the learn
and cluster
commands, and where it will be read from as part of the predict
command.
This unlocks the config from the model, allowing prediction-specific changes to be made before predict
is rerun. It also allows precomputed models to be passed around - the user just needs to ensure they change that paths in the config (environment variables help with this).
The '.model' output of the
learn
step is a pickle file containing the trained model object and the config object.If the config file contains environment variables, these will be parsed during the
learn
step and then provided to thepredict
step.This causes an issue as I want to test
predict
against a precomputed model. When this test is run on another platform (e.g. CircleCi), predict loads the config from the model which contains paths for the platform that the config was parsed on and so files can't be found.Another issue this may potentially cause is that if a user wants to modify parameters before running
predict
(e.g. prediction mask), they have to rerunlearn
and retrain the model unnecessarily so that the config can be repackaged with the model.