Kalan-Lab / lsaBGC-Pan

lsaBGC - suite for pan-BGC-omics analysis
BSD 3-Clause "New" or "Revised" License
18 stars 2 forks source link
biosynthetic-gene-clusters comparative-genomics evolution gene-cluster-families genomics pan-genomics secondary-metabolism

lsaBGC-Pan

Documentation Documentation Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Docker Anaconda-Server Badge Zenodo Manuscript

lsaBGC-Pan - mine the pan-BGC-ome of a microbial taxon for biosynthetic golden nuggets.

lsaBGC-Pan reconfigures lsaBGC for easier installation via either Bioconda or Docker and features a new workflow bearing the same name as the repo. In addition to easier usability, there are some new analytical modules -e.g. (de-)association testing of BGC ortholog groups and GCFs and an improved framework for inferring horizontal transfer.

Manuscript:

Evolutionary investigations of the biosynthetic diversity in the skin microbiome using lsaBGC. Microbial Genomics 2023. Rauf Salamzade, J.Z. Alex Cheong, Shelby Sandstrom, Mary Hannah Swaney, Reed M. Stubbendieck, Nicole Lane Starr, Cameron R. Currie, Anne Marie Singh, and Lindsay R. Kalan

Key Highlights:

✔️ Works for both fungi & bacteria

✔️ Allows for joint analysis of antiSMASH & GECCO BGC predictions (in analysis of Streptomyces olivaceus - this leads to a 42.4% increase in distinct GCFs to using antiSMASH alone)

✔️ Better consideration for incomplete BGCs due to assembly fragmentation

✔️ New analytical features including: (1) genome-wide association testing of orthogroups with GCF co-occurence & (2) improved assessment of horizontal transfer for BGC-associated orthogroups

✔️ Improved consolidated spreadsheet that is easier to assess

✔️ Support for small scale (< 30 genomes) analysis on laptop with minimal databases (~5 GB).

✔️ Bioconda installation tested on both macOS & Linux with easy-to-use Docker image & wrapper script also coming soon!

✔️ BSD-3 License & no uploading of data to webservers = support for industry research

Documentation:

Documentation and three separate tutorials showing application to:

can be found on the wiki at: https://github.com/Kalan-Lab/lsaBGC-Pan/wiki

Example Commands:

Perform analysis using a directory of AntiSMASH results as input:

lsaBGC-Pan -a AntiSMASH_Results/ -o Pan_Results/ -c 10

Provide a directory of AntiSMASH results as input and incorporate GECCO BGC predictions as well:

lsaBGC-Pan -a AntiSMASH_Results/ -o Pan_Results/ -c 10 -rg

Provide a directory of genomes in FASTA format for GECCO-based BGC predictions and analysis (only works for bacteria):

lsaBGC-Pan -g Directory_of_Genomes_in_FASTA/ -o Pan_Results/ -c 10