Author: Anna Saukkonen
anna.saukkonen@gmail.com
See our paper Highly accurate quantification of allelic gene expression for population and disease genetics for additional information
Allele-specific expression (ASE) is the imbalanced expression of the two alleles of a gene. While many genes are expressed equally from both alleles, gene regulatory differences driven by genetic changes (i.e. regulatory variants) frequently cause the two alleles to be expressed at different levels, resulting in allele-specific expression patterns. The detection of ASE events relies on accurate alignment of RNA-sequencing reads, where challenges still remain. This pipeline has been created to adjust for computational biases associated with allelic counts. It comprises of the following steps:
curl -fsSL get.nextflow.io | bash
Make sure you have Java v8+:
java -version
path_to/nextflow run https://github.com/anna-saukkonen/PAC -r main --genome_version GRCh37/38 --reads "path_to_reads_{1,2}.fq.gz" --variants "path_to_variants" --id ID -profile docker/singularity
-r command specifies the branch
path_to/nextflow run PAC/main.nf --genome_version GRCh37/38 --reads "path_to_reads_{1,2}.fq.gz" --variants "path_to_variants" --id ID -profile docker/singularity
reads have to be saved in the same directory in the format: path_to_read_1.fq.gz and path_to_read_2.fq.gz
vcf file needs to be phased
this needs to be same as in the VCF file
(default: "/pac_results")
(default:10 We recommend at least 10 for speed)
Depending on the size of file you might need up to 128000MB, min 64000MB
PAC generates 4 output files:
Haplotype level ASE results columns | Description |
---|---|
contig | chromosome |
start | gene start position |
stop | gene end position |
name | gene name |
aCount | haplotype a coverage |
bCount | haplotype b coverage |
totalCount | total coverage |
Single nucleotide level ASE results columns | Description |
---|---|
Chr | chromosome |
Pos | position along chromosome |
RefAl | reference allele |
AltAl | alternative allele |
MapRef | reference allele coverage |
MapAlt | alternative allele coverage |
MapRatio | reference allele ratio |
Mapcov | total coverage at the site |
To test PAC on smaller dataset:
load java
load singularity
git clone https://github.com/anna-saukkonen/PAC.git
path_to_nextflow/nextflow run PAC/main.nf --genome_version GRCh37 --reads "PAC/test/NA12890_merged_sample0.005{1,2}.fq.gz" --variants "PAC/test/NA12877_output.phased.downsampled.vcf.gz" --id NA12877 -profile singularity
See this folder for output files you should get
Just use
__ ___ __
||__) /___\\ / `
|| / \\ \\__, ,
man ;)