KolmogorovLab / Wakhan

Haplotype-specific somatic copy number aberrations/profiling from long reads sequencing data
MIT License
29 stars 0 forks source link

Wakhan

Note: This repository is under extensive updates.

A tool to analyze haplotype-specific chromosome-scale somatic copy number aberrations and aneuploidy using long reads (Oxford Nanopore, PacBio). Wakhan takes long-read alignment and phased heterozygous variants as input, and first uses extends the phased blocks, taking advantage of the CNA differences between the haplotypes. Wakhan then generates inetractive haplotype-specific coverage plots.

Breakpoints/SVs based segmentation and Copy numbers estimation:

plots_example

//: # ()

Installation (enabling through conda environment)

git clone https://github.com/KolmogorovLab/Wakhan.git
cd Wakhan/
conda env create -f environment.yml -n Wakhan
conda activate Wakhan
cd src/

Usage

Tumor-Normal Mode (requires normal phased VCF)

python main.py --threads <4> --reference <ref.fa>  --target-bam <data.tumor.bam>  --normal-phased-vcf <data.normal_phased.vcf.gz>  --genome-name <cellline/dataset name> --out-dir-plots <genome_abc_output> --breakpoints <severus-sv-VCF>

Tumor-only (requires tumor phased/haplotagged BAM and phased VCF)

python main.py --threads <4> --reference <ref.fa>  --target-bam <data.tumor_haplotagged.bam>  --tumor-vcf <data.tumor_phased.vcf.gz> --genome-name <cellline/dataset name> --out-dir-plots <genome_abc_output> --breakpoints <severus-sv-VCF>
Breakpoints/Structural variations or change point detection algo for copy number model

Wakhan accepts Severus or any other structural variant caller VCF as breakpoints with param --breakpoints inputs to detect copy number changes and this option is highly recommended. However, if --breakpoints option is not used, --change-point-detection-for-cna should be used instead to use change point detection algorithm ruptures alternatively.

Tumor-Normal mixture and purity/ploidy estimation

User can input both --ploidy-range [default: 1.5-5.5 -> [min-max]] and --purity-range [default: 0.5-1.0 -> [min-max] to inform copy number model about normal contamination in tumor to estimate copy number states correctly.

Quick-run if coverage/pileup data is already available

Wakhan produces reads coverage coverage.csv (bin-size based reads coverage) and phasesets reads coverage coverage_ps.csv data, phase-corrected coverage phase_corrected_coverage.csv (as well as tumor BAM pileup pileup_SNPs.csv in case Tumor/normal mode) and stores in directory coverage_data inside --out-dir-plots location. If this data has already been generated in a previous Wakhan run, user can rerun the Wakhan with additionally passing --quick-start and --quick-start-coverage-path <path to coverage_data directory -> e.g., /home/rezkuh/data/1437/coverage_data> in addition to required params in above example runs. This will save runtime significantly by not invoking coverage and pileup methods.

Examples

Few cell lines arbitrary phase-switch correction and copy number estimation output with coverage profile is included in the examples directory.

Required parameters

Optional parameters

Wakhan can also be used in case phasing is not good in input tumor or analysis is being performed without considering phasing:

Here is a sample copy number/breakpoints output plot without phasing.

plots_example

Output produced

Based on best confidence scores, tumor purity and ploidy values are calculated and copy number analysis is performed. Each subfolder in output directory represents best <ploidy><purity><confidence> values.

Following are coverage and SNPs/LOH plots and bed directories in output folder, independent of CNA analysis

Prerequisite

This tool requires haplotagged tumor BAM and phased VCF in case tumor-only mode and normal phased VCF in case tumor-normal mode. This can be done through any phasing tools like Margin, Whatshap and Longphase. Following commands could be helpful for phasing VCFs and haplotagging BAMs.

For normal/tumor pair:

# ClairS phase and haplotag both normal and tumor samples
singularity run clairs_latest.sif /opt/bin/run_clairs --threads 56 --phase_tumor True --use_whatshap_for_final_output_haplotagging --use_whatshap_for_final_output_phasing --tumor_bam_fn normal.bam --normal_bam_fn tumor.bam --ref ref.fasta --output_dir clairS --platform ont_r10

or

# Phase normal sample
pepper_margin_deepvariant call_variant -b normal.bam -f ref.fasta -o pepper/output -t 56 --ont_r9_guppy5_sup -p pepper --phased_output

# Haplotag tumor sample with normal phased VCF (phased.vcf.gz) output from previous step
whatshap haplotag --ignore-read-groups phased.vcf.gz tumor.bam  --reference ref.fasta -o tumor_whatshap_haplotagged.bam

For tumor only:

# Phase and haplotag tumor sample
singularity run clair3_latest.sif /opt/bin/run_clair3.sh --use_whatshap_for_final_output_haplotagging --use_whatshap_for_final_output_phasing --bam_fn=tumor.bam --ref_fn=ref.fasta --threads=56 --platform=ont --model_path=r941_prom_sup_g5014 --output=clair3 --enable_phasing

or

# Phase and haplotag tumor sample
pepper_margin_deepvariant call_variant -b tumor.bam -f ref.fasta -o pepper/output -t 56 --ont_r9_guppy5_sup -p pepper --phased_output