CompEpigen / ezASCAT

Conveniently perform ASCAT copy-number analysis from Tumor-Normal or Tumor only BAM files in R
Other
11 stars 2 forks source link
ascat copynumber genomics

ezASCAT

Lifecycle: experimental

The goal of ezASCAT is to make life simpler while using ASCAT with tumor-normal pairs from WGS. Although there exists ascatNgs, it requires installation of perl and C modules. ezASCAT bypasses these requirements entirely within R with the C code baked in.

Installation

remotes::install_github(repo = "CompEpigen/ezASCAT")

Usage

Step-1: Get nucleotide counts at the marker loci with get_counts

Below command will generate two tsv files tumor_nucleotide_counts.tsv and normal_nucleotide_counts.tsv that can be used for downstream analysis. Note that the function will process ~900K SNPs from Affymetrix Genome-Wide Human SNP 6.0 Array. The process can be sped up by increasing nthreads which will launch each chromosome on a separate thread. Currently hg19 and hg38 are supported.

library("ezASCAT")
#Matched normal BAM files are strongly recommended
counts = ezASCAT::get_counts(t_bam = "tumor.bam", n_bam = "normal.bam", build = "hg19")

Step-2: Prepare input files for ASCAT with prep_ascat()

Tumor-Normal pair

Below command will filter SNPs with low coverage (default <30), estimate BAF, logR, and generates the input files for ASCAT. In addition, it will run ASCAT::ascat.loadData() and ASCAT::ascat.plotRawData() for you and returns the ASCAT object that can be further processed with ASCAT functions.

ascat.bc = prep_ascat(t_counts = "tumor_nucleotide_counts.tsv", n_counts = "normal_nucleotide_counts.tsv", sample_name = "tumor")

# Markers: 901235
# Removed 3072 duplicated loci
# Markers > 30: 25246
# ------
# Counts file: normal_nucleotide_counts.tsv
# Markers: 901235
# Removed 3072 duplicated loci
# Markers > 30: 31387
# ------
# Final number SNPs: 23765
# Generated following files:
# tumor.tumour.BAF.txt
# tumor.tumour.logR.txt
# tumor.normal.BAF.txt
# tumor.normal.logR.txt
# ------
# Running ASCAT::ascat.loadData:
# [1] Reading Tumor LogR data...
# [1] Reading Tumor BAF data...
# [1] Reading Germline LogR data...
# [1] Reading Germline BAF data...
# [1] Registering SNP locations...
# [1] Splitting genome in distinct chunks...
# Running ASCAT::ascat.plotRawData:
# [1] Plotting tumor data
# [1] Plotting germline data
# Returned ASCAT object

The returned ASCAT object can be passed to downstream ASCAT functions:

ascat.bc = ASCAT::ascat.aspcf(ascat.bc)
ASCAT::ascat.plotSegmentedData(ascat.bc)
ascat.output = ASCAT::ascat.runAscat(ascat.bc) 

Tumor only

> ascat.bc = ezASCAT::prep_ascat_t(t_counts = "tumor_nucleotide_counts.tsv", sample_name = "tumoronly")

# Library sizes:
# Tumor: 1239964831
# Counts file: tumor_nucleotide_counts.tsv
# Markers: 930104
# Removed 15 duplicated loci
# Markers > 30: 829579
# ------
# Median depth of coverage: 59
# Generated following files:
# tumoronly.tumour.BAF.txt
# tumoronly.tumour.logR.txt
# ------
# Running ASCAT::ascat.loadData:
# [1] Reading Tumor LogR data...
# [1] Reading Tumor BAF data...
# [1] Registering SNP locations...
# [1] Splitting genome in distinct chunks...
# Running ASCAT::ascat.plotRawData()
# [1] Plotting tumor data
# Returned ASCAT object!

The returned ASCAT object can be processed with ASCAT without matched normal data protocol:

ascat.gg = ASCAT::ascat.predictGermlineGenotypes(ascat.bc) 
ascat.bc = ASCAT::ascat.aspcf(ascat.bc,ascat.gg=ascat.gg) 
ASCAT::ascat.plotSegmentedData(ascat.bc)
ascat.output = ASCAT::ascat.runAscat(ascat.bc) 

CBS segmentation

Alternatively, tumor logR files generated by prep_ascat() can be processed with segment_logR() function which performs circular binary segmentation using DNAcopy and plots the results

> ezASCAT::segment_logR(tumor_logR = "tumor.tumour.logR.txt", sample_name = "tumor")

# Analyzing: tumor 
#   current chromosome: 1 
#   current chromosome: 2 
#   current chromosome: 3 
#   current chromosome: 4 
#   current chromosome: 5 
#   current chromosome: 6 
#   current chromosome: 7 
#   current chromosome: 8 
#   current chromosome: 9 
#   current chromosome: 10 
#   current chromosome: 11 
#   current chromosome: 12 
#   current chromosome: 13 
#   current chromosome: 14 
#   current chromosome: 15 
#   current chromosome: 16 
#   current chromosome: 17 
#   current chromosome: 18 
#   current chromosome: 19 
#   current chromosome: 20 
#   current chromosome: 21 
#   current chromosome: 22 
#   current chromosome: MT 
#   current chromosome: X 
#   current chromosome: Y 
# Segments are written to: tumor_cbs.seg