This repository provides data and examples that were used for development of DeepBGC and its evaluation with ClusterFinder and antiSMASH.
See https://github.com/Merck/deepbgc for the DeepBGC tool.
Reproduction and storage of data files is managed using DVC (development version 0.22.0
).
Each data file has a .dvc
history file that contains the command that was used to generate the output along with md5 hashes of its dependencies.
pip install -r requirements.txt
to download DVC and other requirementsgenerate-aws-config --account lab --insecure
dvc pull data/path/to/file.dvc
to download required file.See notebooks/LabelledContigBootstrap.ipynb.
See data/evaluation/lco-neg-10k (TODO).
See data/evaluation/cv-10fold-neg-10k (TODO).
See notebooks/CandidateClassification.ipynb and notebooks/CandidateActivityClassification.ipynb