This repository is a project to provide Python and Rust libraries to facilitate pangenomics analysis. Several algorithms and data structures used for the Peregrine Genome Assembler are useful for Pangenomics analysis as well. This repo takes those algorithms and data structure, combining other handy 3rd party tools to expose them as a library in Python (with Rust code for those computing parts that need performance.)
Research Preprint:
PGR-TK provides pangenome assembly management, query and Minimizer Anchored Pangenome (MAP) Graph Generation
With the MAP graph, we can use the "principal bundle decomposition" to study complicated structure variants and genome re-arragenment in the human populations.
Command Line Tools:
PGR-TK provides the following tool to
pgr-mdb
: create pgr minimizer database with AGC backendpgr-make-frgdb
: create PGR-TK fragment minimizer database with frg format backendpgr-query
: query a PGR-TK pangenome sequence database, ouput the hit summary and generate fasta files from the target sequencespgr-pbundle-decomp
: generat the principal bundle decomposition though MAP Graph from a fasta filepgr-pbundle-bed2svg
: generate SVG from a principal bundle bed filepgr-pbundle-bed2sorted
: generate annotation file with a sorting order from the principal bundle decompositionpgr-pbundle-bed2dist
: generate alignment scores between sequences using bundle decomposition from a principal bundle bed fileFor each comannd, command --help
provides the detail usage information.
The API documentation is at https://genedx.github.io/pgr-tk/
A collection of Jupyter Notebooks are at https://github.com/genedx/pgr-tk-notebooks/
Check https://github.com/genedx/pgr-tk/releases
See docker/Dockerfile.build_env-20.04
for a build enviroment under ubuntu 20.04.
With the proper build environment, just run bash build.sh
to build all.
For example, on a Mac OS with Docker install, you can clone the repository and build a linux binary within an Ubuntu 20.04 Linux distribution as follow:
git clone --recursive git@github.com:cschin/pgr-tk.git # clone the repo
cd pgr-tk/docker
ln -s Dockerfile.build_env-20.04 Dockerfile
docker build -t pgr-tk-build .
pgr-tk
:Execute
docker run -it --rm -v $PWD:/wd/pgr-tk pgr-tk-build /bin/bash
pgr-tk
inside the docker container from the image pgr-tk-build
cd /wd/pgr-tk
bash build.sh
The build python wheels will be in target/wheels
which can be installed for ubuntun 20.04 python3.8 distribution. You can install it in the pgr-tk-build
image as well to test it out.
If you have a conda install, you can try this to build an conda environment to use pgr-tk v0.3.6 (on linux only):
conda create -n pgr-tk python=3.8
conda activate pgr-tk
conda install -c bioconda -c conda-forge python_abi libstdcxx-ng=12 libclang13 pgr-tk=0.3.6