LiuzLab / TraceQC

Other
6 stars 1 forks source link

TraceQC

TraceQC is a R package for quality control (QC) of CRISPR Lineage Tracing Sequence Data.

Installation

if(!requireNamespace("devtools", quietly=TRUE)) install.packages("devtools")
devtools::install_github("LiuzLab/TraceQC")

To install the Python packages for traceQC, run:

pip install biopython pandas tqdm pysam

Tutorial

The tutorial of TraceQC pipeline for bulk DNA sequencing is here. The dataset is sampled from hgRNA dataset.

The tutorial of TraceQC pipeline for single-cell RNA sequencing is here. The dataset is sampled from Carlin dataset.

TraceQC annotated reference format

The reference is a text file which contains information as follows:

ATGGACTATCATATGCTTACCG...CCGGTAGACGCACCTCCACCCCACAGTGGGGTTAGAGCTAGAAATA
target 23 140

The first line of the reference file is required should be the construct sequence. The second line is also required should be the target barcode region of the construct. In this lines, two numbers next to a region name specify the start and end locations of the region. Locations should be 1-based, i.e. the first location is indicated as 1. Users can optionally add additional regions such as spacer region or PAM region in the same format. Here is an example of the refenence file with additional regions:

ATGGACTATCATATGCTTACCG...CCGGTAGACGCACCTCCACCCCACAGTGGGGTTAGAGCTAGAAATA
target 24 140
spacer 88 107
PAM 108 110

The examples of annotated hgRNA reference sequence is aviable here. The examples of annotated Carlin reference sequence is aviable here.

TraceQC output format

Column Description
character A mutation identification string
type The type of mutation (deletion, insertion and substitution).
start The starting positioin of mutation.
length The length of mutation.
alt The altered sequence.
count The read count of mutation.
cell The cell IDs that contain this mutation.

Documentation

The full documentation of TraceQC functions is available here.

References

Kalhor, R., Mali, P., & Church, G. M. (2017). Rapidly evolving homing CRISPR barcodes. Nature methods, 14(2), 195-200.

Bowling, S., Sritharan, D., Osorio, F. G., Nguyen, M., Cheung, P., Rodriguez-Fraticelli, A., ... & Camargo, F. D. (2020). An engineered CRISPR-Cas9 mouse line for simultaneous readout of lineage histories and gene expression profiles in single cells. Cell, 181(6), 1410-1422.