VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
162 stars 85 forks source link

How to use ascat to call wes CNV #155

Closed lh12565 closed 1 year ago

lh12565 commented 1 year ago

Hi, I want to call CNV of WES using ASCAT. I wonder if I need to run the ascat.prepareTargetedSeq before ascat.prepareHTS as below:

ascat.prepareTargetedSeq(
  Worksheet = "myWorksheet.tsv", 
  Workdir="11",
  alleles.prefix = "G1000_allelesAll_hg38/G1000_alleles_hg38_chr",
  BED_file = "wes.bed",
  allelecounter_exe = "/home/dd/allelecount/bin/alleleCounter",
  genomeVersion = "hg38",
  nthreads = 20)

ascat.prepareHTS(
  tumourseqfile = "../11-C..bam",
  normalseqfile = "../11-B..bam",
  tumourname = "11-C",
  normalname = "11-B",
  allelecounter_exe = "/home/dd/allelecount/bin/alleleCounter",
  alleles.prefix = "G1000_allelesAll_hg38/G1000_alleles_hg38_chr",
  loci.prefix = "G1000_lociAll_hg38/G1000_loci_hg38_chr",
  gender = "XY",
  genomeVersion = "hg38",
  nthreads = 20,
  tumourLogR_file = "Tumor_LogR.txt",
  tumourBAF_file = "Tumor_BAF.txt",
  normalLogR_file = "Germline_LogR.txt",
  normalBAF_file = "Germline_BAF.txt")

or if it can run ascat.prepareHTS directly and specify the wes bed file?

ascat.prepareHTS(
  tumourseqfile = "../11-C..bam",
  normalseqfile = "../11-B..bam",
  tumourname = "11-C",
  normalname = "11-B",
  allelecounter_exe = "/home/dd/allelecount/bin/alleleCounter",
  alleles.prefix = "G1000_allelesAll_hg38/G1000_alleles_hg38_chr",
  loci.prefix = "G1000_lociAll_hg38/G1000_loci_hg38_chr",
  gender = "XY",
  genomeVersion = "hg38",
  nthreads = 20,
  BED_file = "TruSeq_exome_targeted_regions.hg38.bed",
  tumourLogR_file = "Tumor_LogR.txt",
  tumourBAF_file = "Tumor_BAF.txt",
  normalLogR_file = "Germline_LogR.txt",
  normalBAF_file = "Germline_BAF.txt")

Another question is when I run ascat.prepareTargetedSeq or ascat.prepareHTS with bed file, there is an error as below:

Major issue with BED file, please double-check its content

My bed file like this:

chr1    12098   12258   CEX-chr1-12099-12258
chr1    12553   12721   CEX-chr1-12554-12721
chr1    13331   13701   CEX-chr1-13332-13701
chr1    30334   30503   CEX-chr1-30335-30503
chr1    35045   35544   CEX-chr1-35046-35544
chr1    35618   35778   CEX-chr1-35619-35778
chr1    69077   70017   CEX-chr1-69078-70017

I don't know how to solve it. Thanks

tlesluyes commented 1 year ago

Hi @lh12565,

For WES data, only ascat.prepareHTS should be used. ascat.prepareTargetedSeq is a pre-processing function for targeted sequencing data.

The issue with the BED file is triggered when it doesn't fit the data (see here). It is likely that your TruSeq_exome_targeted_regions.hg38.bed file doesn't contain any region for some chromosomes. If that's the case, you need to set chrom_names accordingly (see ?ascat.prepareHTS).

Cheers,

Tom.

tlesluyes commented 1 year ago

Actually, missing chromosomes shouldn't mess things up with the BED and loci. This should be caused by an empty intersection between chrom_names (default: c(1:22,'X')) and the first column in the BED files (chr string gets removed beforehand). Your BED file looks okay but maybe there are some spaces in between chromosome names and start positions?

Please try running the following lines and report the output.

chrom_names=c(1:22,'X')
BED_file='TruSeq_exome_targeted_regions.hg38.bed'
stopifnot(file.exists(BED_file) && file.info(BED_file)$size>0)
BED=read.table(BED_file,sep='\t',header=F,stringsAsFactors=F)[,1:3]
colnames(BED)=c('chr','start','end')
BED$chr=gsub('^chr','',BED$chr)
BED$start=BED$start+1 # Start is 0-based in BED files
print(nrow(BED))
BED=BED[BED$chr %in% chrom_names,]
print(nrow(BED))
print(head(BED$chr))

Cheers,

Tom.