ccgproject / ccgp_assembly

CCGP Repository for the genome assembly working group.
5 stars 5 forks source link

California Conservation Genomics Project (CCGP)

California Conservation Genomics Project (CCGP) repository for the genome assembly working group.



This repository contains scripts used for the reference genome assembly efforts of the CCGP.

CCGP reference genomes are assembled following a protocol adapted from Rhie et al. (2021). Assemblies are comprised of PacBio HiFi long read data, which is scaffolded using proximity ligation/chromatin conformation capture (HiC or OmniC) (Dovetail Genomics). Our minimum target reference genome quality is 6.7.Q40, and in most cases we expect to reach 7.C.Q50 or better (see Table 1 in Rhie et al. (2021)).

Here the overview of our current pipeline:

CCGP: Overview of our current pipeline

Pipeline overview

There have been multiple versions since the beginning of the project and this is an overview of how the pipeline has evolved.

CCGP: Evolution of the assembly pipeline

Color blocks:


Pre-assembly quality control and data validation

de novo assembly (contigging)

Purge haplotigs: haplotypic duplications and contig overlaps


Checking for misassemblies

Gap closing

Mitochondrial assembly

Contamination screening

Metrics / stats / Others


Learn more

Shaffer HB, Toffelmier E, Corbett-Detig RB, Escalona M, Erickson B, Fiedler P, Gold M, Harrigan RJ, Hodges S, Luckau TK, Miller C, Oliveira DR, Shaffer KE, Shapiro B, Sork VL, Wang IJ (2022) Landscape genomics to enable conservation actions: the California Conservation Genomics Project. Journal of Heredity, 113 (6): 577–588,
