VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
164 stars 85 forks source link

Tumor only mode for WGS #122

Closed byeongill closed 1 year ago

byeongill commented 1 year ago

Hi, My data are tumors from very early stage to late stage (4 stage). I ran tumor only mode for WGS as shown this issue. #73 Consiesly, I obtained LogR, modifying the function ascat.custumed.getBAFsAndLogRs.

tumourLogR = totalTumour/totalNormal tumourLogR = log2(tumourLogR/mean(tumourLogR, na.rm = T)) to tumourLogR = log2(totalTumour/median(totalTumour))

And I ran ascat.correctLogR. And I set parameters to ran ascat.predictGermlineGenotypes. When I calculated the hom/htz/open proportion, the value differed by stage. (examples) image So, I set the parameters for each sample. The results looked probably good. tumorSepPSH005  7 tumorSepPSH022  5

And I ran ascat.aspcf (penalty 80), ascat.runAscat (default)

(stage 1) PSH005 ASPCF  4 PSH005 ASCATprofile  5

(stage 2) PSH022 ASPCF  5 PSH022 ASCATprofile  5

I think the purity of stage 1 are not exact because no CNV. The stage 1 shows many noise, especially in loss. Why these noise occurred ? Can I filter these noise?

Your answer will be very helpful. Thanks.

tlesluyes commented 1 year ago

Hi @byeongill,

It looks like your custom setting for ascat.predictGermlineGenotypes works well: it nicely captures het/noisy/hom genotypes. However, you should define one single set of parameters and use that setting on all samples (not defining a sample-specific setup, if I'm interpreting your sentence "I set the parameters for each sample" correctly). This is because CNAs (as illustrated in your stage 4 sample) will bais het/noisy/hom interpretation.

When you say "ascat.runAscat (default)", are you using the default gamma value (0.55) or are you setting gamma to 1 (should be 1 for HTS data)? Also, is logR corrected for GC/RT? I should clean the logR space but BAF would still be over-segmented. I would give it a try with higher penalty values such as 100 and 140. We've seen a few over-segmented profiles using WGS, even using T/N pairs and increasing the penalty should be rid of the little segments.

Cheers,

Tom.

byeongill commented 1 year ago

Yes, I corrected for GC/RT before ascat.predictGermlineGenotypes. I used gamma value 1.

Now, I set single hom/htz/open proportion for all samples (median of all samples, hom=0.63, htz=0.30, open=0.07). tumorSepPSH005 tumorSepPSH022

I set penalty values as 140. PSH005 ASPCF PSH022 ASPCF Do I need penalty more than 140 ?

I have another question. I will run phyloWGS(https://github.com/morrislab/phylowgs/tree/master/parser) using allele specific CN of ASCAT. I need cellular prevalence(fraction of cells in sample containing the CNV, not just the fraction of tumor cells containing the CNV) to run phyloWGS. Can I get cellular prevalence ?

Very thanks!

tlesluyes commented 1 year ago

Hi @byeongill,

It looks like hom=0.63 is too much: in your example above (sample 1444666), the blue bands range from 0 to 0.22/0.25 and from 0.75/0.78 to 1, whereas only BAF> ~0.95 and BAF< ~0.05 should be blue. so the blue/green ratio is quite unbalanced. In your first message, sample 1427334 looked much better and you should aim for that setting, did you use hom=0.62 instead of 0.63?

ASCAT does not call subclonal CNAs, only clonal CNAs (please use CNA over CNV, CNV are germline changes). Therefore, a given CNA reported in ASCAT would be present in 100% of cancer cells. Back to your question, I think that purity fits the 'prevalence' definition (fractions of cells with a given CNA, because the fraction of tumour cells containing the CNA is 100%).

Cheers,

Tom.

tlesluyes commented 1 year ago

I've just realised I missed your question about trying penalty>140. It really depends on the strategy you want/need to adopt and downstream analyses. If you feel like missing very small events is perfectly acceptable, going for penalty=200 might help clean the profiles but you'll have a few false-negatives. If you need a conservative approach where the signal in logR/BAF should carefully be interpreted, then keeping penalty=140 and accepting that you'll have a few false-positives might be best. It has to be a trade-off between sensitivity and specificity.

That being said, it might be appropriate to try cleaning the logR and BAF tracks based on common CNVs such as the database of genomic variants. You could use penalty=140 but remove all SNPs located in regions with common germline changes, it might help remove false-positives and prevent under-segmenting.

Cheers,

Tom.