trans-AT PKS Annotation and Comparison Tool
transPACT is a joint collaboration between the University of Wisconsin-Madison, ETH Zurich, and Wageningen University.
EJN Helfrich, R Ueoka, MG Chevrette*, F Hemmerling, X Lu, S Leopold-Messer, AY Burch, SE Lindow, J Handelsman, J Piel†, MH Medema†. Evolution of combinatorial diversity in trans-acyltransferase polyketide synthase assembly lines across bacteria. 2021. Nature Communications 12, 1422. 10.1038/s41467-021-21163-x
* equal contributions
† to whom correspondance should be addressed; JP: jpiel (at) ethz.ch | MHM: marnix.medema (at) wur.nl
Trans-acyltransferase polyketide synthases (trans-AT PKSs) are multimodular enzymes that biosynthesize diverse pharmaceutically and ecologically important natural products. Here, we developed and applied a phylogenomic algorithm, transPACT (trans-AT PKS Annotation and Comparison Tool), to perform a global computational analysis of trans-AT PKS gene clusters, identifying hundreds of evolutionarily conserved module blocks. Network analysis of their exchange patterns reveals a widespread diversification mechanism for these enzymes. transPACT implementation to assign substrate specificity to trans-AT PKS's ketosynthase (KS) domains can be found within this repository, as well as helper scripts used to generate the global trans-AT PKS network. transPACT is typically run independently, but is built within the antiSMASH 4.x architecture [paper] [repo].
Dependencies are listed in conda_packages.txt
. It is highly suggested for users to create their own conda environment using this file, e.g.:
conda create --name transPACT --file conda_packages.txt
This creates a new environment called transPACT
with all dependencies installed. This environment can now be accessed by:
conda activate transPACT
Install/setup time on a "normal" desktop computer should be less than 5 minutes. In tests, setup completed in 26 seconds with: date && git clone https://github.com/chevrm/transPACT.git && cd transPACT && conda create --name transPACTtest --file conda_packages.txt && conda activate transPACTtest && date
python2 transPACT_substrate_from_faa.py <protein_fasta_of_KS_domains.faa>
example/test.faa
.date && python2 transPACT_substrate_from_faa.py example/test.faa && date
python2 ./data/dendrogram20200320/generate_dendrogram_userweights.py <Jaccard_weight> <DSS_weight> <AdjacencyIndex_weight>
data/dendrogram20200227/itol_bin.txt
for denoting whether a BGC lies on a contig edge and data/dendrogram20200227/itol_dom.txt
for annotating the KS-domain clades of the pathway.mkdir data/dendrogram_test && cd data/dendrogram_test && cp ../dendrogram20200320/generate_dendrogram_userweights.py ./
diamond makedb -d all --in data/dendrogram20190514/KS_precomputed_1405_hmmalign_trimmed_renamed.fasta && diamond blastp -d all -q data/dendrogram20190514/KS_precomputed_1405_hmmalign_trimmed_renamed.fasta -o full.dbp
generate_dendrogram_userweights.py
to point to the absolute path of full.dbp
above.python2 ./data/dendrogram20190829/generate_dendrogram_userweights.py 0 0.32 0.68
The core transPACT algorithm is found at antismash/specific_modules/nrpspks/nrpspksdomainalign/substrate_from_faa.py
. It has been symbolically linked at transPACT_substrate_from_faa.py
for user convenience. For each ketosynthase domain (input as a protein fasta), KSs are aligned to a reference alignment of a core set of 647 experimentally characterized KS domains with MUSCLE (see align_ks_domains()
; invoked on line 533). This alignment is used to phylogenetically place the query sequence onto a reference phylogeny (placement with pplacer; see run_pipeline_pplacer()
; invoked on line 534) and query sequences are assigned to a clade and functional classification based on monophyly (see parse_pplacer()
).