Closed chengwenxuan1997 closed 1 year ago
Hi @chengwenxuan1997,
ASCAT is unable to find a purity/ploidy fit because of the huge noise in the logR track (BAF looks okay though). Can you please provide some details on how logR and BAF were derived (platform, method)? Also, as indicated in the doc, we recommend correcting logR for covariates although I don't think it'll resolve such a level of noise.
Cheers,
Tom.
Thank you very much for responding so quickly. Our WES samples were processed with the ascat.prepareHTS function. The detailed code is as follows. The genome version is hg38
BED.file <- list.files(file.path(ref.path, genome), pattern = "bed$", full.names = T)
probloci.file <- list.files(file.path(ref.path, genome), pattern = "^probloci.+txt.gz$", full.names = T)
ref.fasta <- list.files(file.path(ref.path, genome), pattern = "fa$", full.names = T)
alleles.prefix <- glue(file.path(ref.path, genome, "G1000_alleles*{genome}", "G1000_alleles_{genome}_chr"))
loci.prefix = glue(file.path(ref.path, genome, "G1000_loci*{genome}", "G1000_loci_{genome}_chr"))
# prepare logR from BAM
ascat.prepareHTS(tumourseqfile = file.path(data.path, SampleInfo$TumorBam)[i],
normalseqfile = file.path(data.path, SampleInfo$NormalBam)[i],
tumourname = paste0(SampleInfo$SampleName)[i],
normalname = paste0(SampleInfo$SampleName, "_N")[i],
allelecounter_exe = "alleleCounter",
gender = "XX", nthreads = 8,
genomeVersion = genome,
BED_file = BED.file, probloci_file = probloci.file, ref.fasta = ref.fasta,
alleles.prefix = alleles.prefix, loci.prefix = loci.prefix)
The BEL_20C is the only one of 22 samples which failed to get a purity estimation
Can you provide the coverage values for both the tumour and the matched normal? It seems like the coverage for the tumour is good so the BAF does look good, but I'm wondering if the matched normal is a low-coverage sample so the logR is noisy because of the T/N normalisation. Can you dig into this?
Cheers,
Tom.
Also, do the logR tracks for the 21 other samples look this noisy as well? The fact that ASCAT provides CNA profiles doesn't necessarily mean that they are correct: this level of noise will generate a bunch of false-positives.
Do you mean the *.alleleFrequency.txt file? Because my pipeline deletes them automatically for convenient storage, I may need more time to recover these files. By the way, why the messy BAF signal looks good? There should be only one signal under the ASCAT assumption, to my understanding.
Coverage can be inferred from the *.alleleFrequency.txt files but there are other tools to compute coverage from BAMs. Are we talking about a 10-20X sequencing or a 100X sequencing?
BAF looks messy but this is just because of CNAs, it looks good because imbalances can be seen and are captured by ASCAT.
Not sure what you mean with "one signal under the ASCAT assumption". One signal per what?
No, the other samples all look very clear as follows.
The other bad one is BEL_1T, which has a messy logR plot and a clear BAF plot. But the pipeline succeeds in getting an estimation of this sample.
Sorry for not making it clear just now. The one signal under the ASCAT assumption means all B-allele frequencies perhaps should fluctuate around a specific value in one sample.
So you mean the noisy logR is caused by the low coverage of the normal sample, and thus contributes to the failed estimation?
The coverage of BEL_20C and BEL_20N are 141.5x and 24.02x.
I can't see the tracks in the screenshot you sent a few messages ago but is the coverage for BEL_20N and BEL_1N lower than the ones for the other normal cases? My guess is that logR is noisy because the coverage in matched normal samples is quite low. Is there any sample where the matched normal also is ~25X but logR in the tumour is cristal-clear?
The samples with the lowest coverage are BEL_28N (13.4, tumor 24.68), BEL_20N (24.02, tumor 141.5), BEL_1N (35.51, tumor 76.5), BEL_19N (39.11, tumor 16.3). The BEL_28T and BEL_19T are estimated successfully, their figures were as follows.
Hi @chengwenxuan1997,
Thanks for providing additional information. Overall, the tracks look a bit noisy (at least noisier compared to what we observed with WES data from TCGA) but this is related to coverage I suppose. In the two profiles above, I'd say that the segments are correctly identified. Still, I don't know what's happening for BEL_1T and BEL_20T but the noise in logR is crazy high. I would recommend excluding these two samples (even if ASCAT does provide a CNA profile for BEL_1T). With TCGA data, we use MAPD>0.4 (see ?ascat.metrics
) to define noisy samples. It also seems like the resolution for these two samples is lower compared to other samples (n_het_SNP
from ascat.metrics
).
Alternatively, you could try deriving logR using other methods and feed it into ASCAT or try a different approach such as ASCAT.sc.
Cheers,
Tom.
Thanks very much. I will try to use ascat.metrics
and ASCAT.sc in my pipeline. By the way, what could contribute to such a noisy logR? I may need to talk with our PI about focusing on or ignoring this sample. Is there any biological cause or just technical noise?
Hi @chengwenxuan1997,
It can be due to various things: the sample itself (DNA degradation), poor capture, library prep, sequencing problems, low coverage, unmatched T/N, etc.
Cheers,
Tom.
Alright, I got it. There are just too many uncertain factors. I really appreciate your response, it has been quite helpful to me.
I came across an error when trying to compute purity and ploidy on my sample. ASCAT could not find an optimal ploidy and purity value for sample BEL_20C. I doubt it is because there is no dominating tumor subclone based on the ascat plot. But I am not sure and want to ask you for suggestions. Many thanks in advance! The code is as follows.
The output of ascat.prepareHTS is uploaded here. BEL_20C.zip
The ascat plot is as follows.