GeoscienceAustralia / uncover-ml

Machine Learning system for Geoscience Australia uncover project
Apache License 2.0
30 stars 20 forks source link

Break up rawcovariates.csv output #102

Open brenmous opened 4 years ago

brenmous commented 4 years ago

An output of learning is a rawcovariates.csv file that was originally intended to show untransformed values for each covariate at each position. It now contains multiple extra fields, such as target values, prediction values from cross-validation, user defined fields etc.

It's overloaded and in a bad state at the moment because it gets written initially when covariate/target intersection occurs and then opened and written to again after cross-validation is performed. It might be a better idea to break up this file and write multiple files instead - or maybe carry a Pandas DataFrame throughout the workflow adding results to it and outputting it as one big results table.

If embarking on this be aware that the a lot of diagnostics.py functions (plotting) read this file and rely on the column ordering.