Closed redshiftzero closed 7 years ago
OK comments addressed, ansible-ification of the creation of the models schema and tables is done, Travis builds are passing, and it's rebased on current master. Should be good to go 🌞
What a review process this has been! Thanks for your patience here, @redshiftzero. Given the frequent back-and-forth here, I'm inclined to merge, and we can bite off smaller hunks to discuss in discrete issues going forward.
👍 sounds good - any other outstanding problems we can make issues for and address in smaller PRs
This PR adds an initial machine learning pipeline that takes the features in the database, trains a series of binary classifiers, evaluates how well each classifier performs, and then saves a bunch of relevant performance metrics in the database as well as pickling the trained model objects (for use in future scoring). The work in this PR corresponds to the latter half of this diagram from the
features
schema on:A more complete description of our pipeline is described in
docs/pipeline.md
and a (very) brief description of how specialized classifiers might be integrated is stored inCONTRIB.md
.