GeoscienceAustralia / uncover-ml

Machine Learning system for Geoscience Australia uncover project
Apache License 2.0
30 stars 20 forks source link

Rationalise outputs #87

Closed brenmous closed 4 years ago

brenmous commented 4 years ago

Output in UncoverML is very adhoc. From looking at the documentation it's not known what outputs are produced and where they will be stored. Some are related to config parameters, others are implicit. These need to be fixed and documented.

brenmous commented 4 years ago

Output is now reduced to:

Always generated *_rawcovariates.csv - intersected targets and covariates *_rawcovariates_mask.csv - mask array for the above *_transformed_targets.csv - list of target values pre and post scaling/transformation *.model - the model, produced by the learn command and consumed by the predict command

Generated if cross validation enabled *_crossval_results.csv - target value with corresponding prediction from cross validation step *_crossval_scores.json - metrics from cross validation e.g. r^2 score

Generated if feature ranking enabled *_featureranks.json - covariates ordered by cross validation scoring, grouped by metric

Generated if covariate/target pickling turned on: features.pk (can be given any name in the config) targets.pk (can be given any name in the config)

With the exception of pickle files and the .model file (these can be given specific paths because they need to be read as well as written), all files are written to the output directory specified in the config