VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
162 stars 85 forks source link

LogR correction, 10% overlap check #180

Closed maartenvanelst closed 4 hours ago

maartenvanelst commented 2 months ago

Hello,

I am trying to use ASCAT for a project that requires quick ploidy estimation and later for mutational signature analysis of pediatric whole genome sequencing data. I use a github install of your package, and have a local build working with seemingly correct output.

However, there is a check (see below) for LogR correction makes the program fail, and I am curious as to why this check for a 10% overlap is in place. In my data this overlap is around 8.5%. When I comment this line out in a local build the output seems accurate, but I would like to know what this error means. You will find details below, but I am mostly interested in the purpose of this check.

Thank you in advance! :)

Best, Maarten

The LogR correction check in question, happening here

Error in ascat.correctLogR(ascat.bc, GCcontentfile = "G1000/GC_G1000_hg38.txt",  :
  length(ovl) > nrow(ASCATobj$Tumor_LogR)/10 is not TRUE

R script calling ASCAT

tumor_bam_path = args[1]
normal_bam_path = args[2]
tumor_label = args[3]
normal_label = args[4] 
sex = args[5] # 'XX' or 'XY'

# Chromosome names
if(sex == 'XX'){chrom_names = c(1:22, "X")}
if(sex == 'XY'){chrom_names = c(1:22, "X", "Y")}

# ASCAT loading
library(devtools)
# devtools::install("USER/software/ascat/ASCAT")
library(ASCAT)

# Intermediate files:
tumor_logR = paste("output/", tumor_label, "_LogR.txt", sep = "")
tumor_BAF = paste("output/", tumor_label, "_BAF.txt", sep = "")
normal_logR = paste("output/", normal_label, "_logR.txt", sep = "")
normal_BAF = paste("output/", normal_label, "_BAF.txt", sep = "")

# ASCAT running
genomeVersion = 'hg38'
ascat.prepareHTS(
  tumourseqfile = tumor_bam_path,
  normalseqfile = normal_bam_path,
  tumourname = tumor_label,
  normalname = normal_label,
  allelecounter_exe = "USER/software/miniconda3/envs/ascat/bin/alleleCounter",
  alleles.prefix = "USER/allelecounter/G1000/G1000_alleles_hg38_chr",
  loci.prefix = "USER/allelecounter/G1000/G1000_loci_hg38_chr",
  skip_allele_counting_normal = T,
  skip_allele_counting_tumour = T,
  gender = sex,
  genomeVersion = genomeVersion,
  nthreads = 10,
  tumourLogR_file = tumor_logR,
  tumourBAF_file = tumor_BAF,
  normalLogR_file = normal_logR,
  normalBAF_file = normal_BAF,
  chrom_names = chrom_names
  )
ascat.bc = ascat.loadData(
  Tumor_LogR_file = tumor_logR, 
  Tumor_BAF_file = tumor_BAF, 
  Germline_LogR_file = normal_logR, 
  Germline_BAF_file = normal_BAF, 
  gender = sex, 
  genomeVersion = genomeVersion
  )
ascat.plotRawData(ascat.bc, img.prefix = "Before_correction_")
ascat.bc = ascat.correctLogR(
  ascat.bc, 
  GCcontentfile = "USER/allelecounter/G1000/GC_G1000_hg38.txt", 
  replictimingfile = "USER/allelecounter/G1000/RT_G1000_hg38.txt"
  )
ascat.plotRawData(ascat.bc, img.prefix = "After_correction_")
ascat.bc = ascat.aspcf(ascat.bc)
ascat.plotSegmentedData(ascat.bc)
ascat.output = ascat.runAscat(ascat.bc, gamma=1, write_segments = T)
QC = ascat.metrics(ascat.bc,ascat.output)
save(ascat.bc, ascat.output, QC, file = 'ASCAT_objects.Rdata')

Before running ASCAT, I run alleleCounter separately for each chromosome {1..22, X} using the command below:

alleleCounter
    -b USER/data/BAM/{bam_file} \
    -l USER/allelecounter/G1000/G1000_loci_hg38_chr{chr_num}.txt \
    -o USER/allelecounter/output/{sample_id}/{sample_id}_alleleFrequencies_chr{chr_num}.txt \
    -m 20 \
    -q 35 \
    --dense-snps
zhangzhhcb commented 4 hours ago

The 10% indicate the overlap between the reference (GC file) and your own data, to make sure there are enough logR data points to be corrected. However, the cutoff may be changed depending on your own data quality.