Merck / bgc-pipeline

MIT License
10 stars 2 forks source link

Note!

This repository provides data and examples that were used for development of DeepBGC and its evaluation with ClusterFinder and antiSMASH.

See https://github.com/Merck/deepbgc for the DeepBGC tool.

Note!

DeepBGC development & evaluation code

Reproducing data

Reproduction and storage of data files is managed using DVC (development version 0.22.0). Each data file has a .dvc history file that contains the command that was used to generate the output along with md5 hashes of its dependencies.

Installation

Downloading a file

High-level overview

Main folders

Training a model

Predicting using trained model

Bootstrap validation on 9 Fully-annotated genomes

See notebooks/LabelledContigBootstrap.ipynb.

Leave Class Out validation and Cross validation

See data/evaluation/lco-neg-10k (TODO).

See data/evaluation/cv-10fold-neg-10k (TODO).

Random Forest classification

See notebooks/CandidateClassification.ipynb and notebooks/CandidateActivityClassification.ipynb

Novel BGC candidates generation

See notebooks/NovelCandidates.ipynb.