Open gnperdue opened 4 years ago
News from @jasonstjohn
Automated data quality checks are next. Other than a reasonable file size trend,
we don't have any assurance that we're getting all the data we want, and that it's
free of junk values, duplicates, etc. I expect to fill in that 'etc' only with some
careful thought and experimentation (mostly plot-making).
We need to build a suite of scripts to analyze the training to be sure it is appropriate for training and make sure we understand the inputs. What sort of data would we exclude?