aldro61 / kover

Learn interpretable computational phenotyping models from k-merized genomic data
http://aldro61.github.io/kover/
GNU General Public License v3.0
50 stars 14 forks source link
biomarker-discovery genomics k-mer machine-learning phenotypes

2.0

DOI Build Status

Kover is an out-of-core implementation of rule-based machine learning algorithms that has been tailored for genomic biomarker discovery. It produces highly interpretable models, based on k-mers, that explicitly highlight genotype-to-phenotype associations.

Introduction

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potential new ones. An open-source disk-based implementation that is both memory and computationally efficient is included with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Drouin, A., Letarte, G., Raymond, F., Marchand, M., Corbeil, J., & Laviolette, F. (2019). Interpretable genotype-to-phenotype classifiers with performance guarantees. Scientific Reports, 9(1), 4071. [PDF]

Drouin, A., Giguère, S., Déraspe, M., Marchand, M., Tyers, M., Loo, V. G., Bourgault, A. M., Laviolette, F. & Corbeil, J. (2016). Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons. BMC Genomics, 17(1), 754. [PDF]

Video lecture:

The Set Covering Machine implementation in Kover was featured in the following video lecture:

Interpretable Models of Antibiotic Resistance with the Set Covering Machine Algorithm, Google, Cambridge, Massachusetts (February 2017)

Google tech talk

Installation

You can use either of the following options:

Tutorials

For tutorials on how to use Kover with your data, see: http://aldro61.github.io/kover/doc_tutorials.html

Documentation

The documentation can be found at: http://aldro61.github.io/kover/

Contact

If you need help using Kover, please use Biostars. To report a bug, please create an issue on GitHub.