Add initial machine learning pipeline

freedomofpress / fingerprint-securedrop

A machine learning data analysis pipeline for analyzing website fingerprinting attacks and defenses.

GNU Affero General Public License v3.0

29 stars 9 forks source link

Add initial machine learning pipeline #57

Closed redshiftzero closed 7 years ago

redshiftzero commented 8 years ago

This PR adds an initial machine learning pipeline that takes the features in the database, trains a series of binary classifiers, evaluates how well each classifier performs, and then saves a bunch of relevant performance metrics in the database as well as pickling the trained model objects (for use in future scoring). The work in this PR corresponds to the latter half of this diagram from the features schema on:

A more complete description of our pipeline is described in docs/pipeline.md and a (very) brief description of how specialized classifiers might be integrated is stored in CONTRIB.md.

coveralls commented 7 years ago

Coverage remained the same at 72.727% when pulling 847e4e61ed2cf968b02163b648ea01a2239e9356 on ml-classifiers into b183c0c623763b1c244b5617f126ba1be7a4bd53 on master.

coveralls commented 7 years ago

Coverage remained the same at 72.727% when pulling 5a474343d0cce044835c069ff5a563255f72ad5e on ml-classifiers into b183c0c623763b1c244b5617f126ba1be7a4bd53 on master.

redshiftzero commented 7 years ago

OK comments addressed, ansible-ification of the creation of the models schema and tables is done, Travis builds are passing, and it's rebased on current master. Should be good to go 🌞

conorsch commented 7 years ago

What a review process this has been! Thanks for your patience here, @redshiftzero. Given the frequent back-and-forth here, I'm inclined to merge, and we can bite off smaller hunks to discuss in discrete issues going forward.

redshiftzero commented 7 years ago

👍 sounds good - any other outstanding problems we can make issues for and address in smaller PRs