AbeelLab / ptolemy

GNU General Public License v3.0
10 stars 2 forks source link

Overview

Ptolemy is a reference-free approach for analysing microbial genome architectures, particularly, to study gene and structural diversity. In a nutshell, it uses a "top-down" approach to align multiple genomes via synteny analysis. The output is a gene-based population genome graph describing genes and structural variants that are unique/shared across a population. It requires a set of FASTA-formatted-assemblies and corresponding GFF-formatted-annotations.

You can read more about it in our publication.

Experimental branch: bacterial-phage metagenomics

This is an experimental branch in an ongoing collaborative project for studying genome architectures of bacteria phages.

Aside from some optimizations, there is an experimental, standalone (noisy) long-read aligner. In essence: use Ptolemy to build gene-based population genome graphs of available bacteria-phage genomes, then align long-reads from a metagenomic sequencing run to identify existing/new architectures.

As an example, a graph of all available Pseudomonas genomes (146) from NCBI, followed by alignment of a barcoded sample from a metagenomics nanopore sequencing run generated by undergraduate bachelor students:

alt text

Executable JAR

Executable jar files are available under releases.

DEPENDENCIES

Ptolemy requires minimmap2 (uses it for performing pairwise gene-alignment during database creation and syntenic anchoring).

Running Ptolemy

Ptolemy requires a tab-delimited file containing unique sample identifier, path to assembly, and path to gene annotations. For example:

Genome1 path/to/assembly/genome1.fa path/to/annotations/genome1.gff
Genome2 path/to/assembly/genome2.fa path/to/annotations/genome2.gff
Genome3 path/to/assembly/genome3.fa path/to/annotations/genome3.gff

There are three main steps in Ptolemy:

  1. Database creation ( java -jar ptolemy.jar extract ... )
  2. Multiple-genome alignment via syntenic anchoring ( java -jar ptolemy.jar syntenic-anchors ... )
  3. Canonical graph construction ( java -jar ptolemy.jar canonical-quiver ... )

The experimental steps:

  1. Index canonical quiver ( java -jar ptolemy.jar index-graph ... )
  2. Long-read alignment ( java -jar ptolemy.jar align-reads ... )

A typical workflow:

#graph construction
java -jar ptolemy.jar extract -g genome_list.txt -o ptolemy_db
java -jar ptolemy.jar syntenic-anchors --db ptolemy_db -o  .
java -jar ptolemy.jar canonical-quiver -s syntenic_anchors.txt --db ptolemy_db -o .

#long-read alignment
java -jar ptolemy.jar index-graph -c canonical_quiver.gfa --db db/
java -jar ptolemy.jar align-reads -r reads.fa -c canonical_quiver.gfa --db db/ -o . -p alignment

The graph is stored as a GFA-formatted file and can be visualized via graph-visualizers such as Bandage.

Test-data available under 'testing_data' directory which contains full Pacbio assemblies of a single yeast chromosome from three genomes.