This is the repository related to our manuscript published in Nature Communications: "Prediction of Klebsiella phage-host specificity at the strain level", authored by Boeckaerts D, Stock M, Ferriol-González C, Jesús O-I, Sanjuan R, Domingo-Calap P, De Baets B and Briers Y.
code
: all the final code for the PhageHostLearn system, allowing researchers to train models, reproduce our analyses or make new predictions (see below)analysis_notebooks
: folder including various subfolders related to certain analyses of the work, for informative purposes; _notebooksexploration (various exploratory analyses), _notebooksmodels (previous iterations of the PhageHostLearn system) and _notebooksprocessing (separate old notebooks for processing genome data)phagehostlearn_inference.ipynb
notebook in the code
folder.phagehostlearn_training.ipynb
notebook in the code
folder.For typical datasets (up to hundreds of phages and/or hundreds of bacteria), no specialized GPU hardware is strictly needed (although it can speed things up). For comparison, our dataset of around 100 phages and 200 bacteria took 5-6 hours to process and train a model for on an 8-core Apple M1.
This software has been developed in Python v3.9.7 on an Apple M1 Macbook Air. It requires the following software dependencies: PHANOTATE v1.5.0 (https://github.com/deprekate/PHANOTATE), PhageRBPdetection v2.1.3 (https://github.com/dimiboeckaerts/PhageRBPdetection), Kaptive v2.0.0 (https://github.com/klebgenomics/Kaptive), ESM-2 v1.0.3 (https://github.com/facebookresearch/esm), XGBoost v1.5.0 (https://github.com/dmlc/xgboost), Scikit-learn v0.24.2 (https://scikit-learn.org/stable/), biopython v1.79, joblib v1.1.0, json v4.2.1, matplotlib v3.4.3, numpy v1.20.3, pandas v1.3.4, pickle 0.7.5 and seaborn v0.11.2. Alls of these dependencies can be conveniently installed with pip
(should only take minutes), apart from Kaptive and PhageRBPdetection, which were downloaded from their GitHub repositories and are incorporated in this repository. Kaptive requires BLAST+ to be installed on the command line, see the NCBI website for installation.