Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

canvas weaker in somatic than cnvnator ? #102

Open worker000000 opened 5 years ago

worker000000 commented 5 years ago

In the paper it showed cnvnator in germline,but no in somatic, does this imply that canvas weaker in somatic than cnvnator, thanks, I just want to know the truth, both authors are definitely respectable man.

worker000000 commented 5 years ago

https://academic.oup.com/bioinformatics/article/32/15/2375/1743834 image

eroller commented 5 years ago

I don't believe we evaluated CNVnator for somatic calling. Does it perform tumor purity/ploidy estimation?

worker000000 commented 5 years ago

yes, you do not, what I want to say is that you compare canvas and cnvnator in germline calling, showing canvas better, but in somatic compare, you did not show the cnvnator, does it imply that canvas is not better than cnvnator in somatic calling, thanks a lot.

eroller commented 5 years ago

We aren't implying that Canvas is better than CNVnator for somatic calling, we just did not evaluate it. The reason for that is CNVnator is mainly suited for germline analysis as it doesn't appear to estimate tumor purity/ploidy. On low purity or high ploidy samples, without performing purity/ploidy correction the CN calls will be incorrect.

worker000000 commented 5 years ago

On low purity or high ploidy purity samples, without performing purity/ploidy correction the CN calls will be incorrect. I do not understand this, is it convenient for you to clarify this more detailed, thanks a lot.

eroller commented 5 years ago

Here is a simple example. Imagine the diploid coverage of the normal sample is 100 (arbitrary coverage units). Given a somatic sample, if we see the coverage in a region is 150, we might suspect the copy number is 3 (50 coverage units per copy) if we assume the sample contains 100% tumor cells. However if the somatic sample is only 50% pure (i.e. 50% tumor cells and 50% normal cells), then the coverage increase is due to only 50% of the cells in the sample (the tumor cells) so these cells must contain an even higher copy number for the region, 4 to be exact. This is how the purity of the sample can affect the copy number call.

A similar situation occurs when the overall ploidy for a tumor sample is not the expected 2. For example, whole genome duplication in tumor cells result in tetraploidy. Without correcting for this change in overall ploidy of the genome, the copy number calls will be incorrect when using a simple coverage model that assumes a diploid baseline.

worker000000 commented 5 years ago

thanks for your clear examples, how do you think can get a better baseline for cnv, do you think a PoN(pool of normals) or a normal is better,

eroller commented 5 years ago

For WGS it is not clear to me how much better a PoN will perform compared to "self-normalization" which is what Canvas does. It will be interesting to see published results from GATK CNV caller. The coverage biases with WGS tend to be minimal. For enrichment data normalization is critical due to coverage biases. More normal samples will give better baselines in that case. The only advantage of using a single matched normal compared to a PoN for somatic analysis that is apparent to me would be to detect events in a tumor sample that overlap existing events in the normal sample. However, these may be rare and of unknown clinical significance so the advantage seems minimal.

worker000000 commented 5 years ago

thanks for your professional reply, how do you think whether germline variants include some somatic variants is right? or somatic variants do not contains some germline variants.

Sometimes somatic just drop some regions, so I guess this maybe a reason for why somatic variants do not have germline variants.

I am very eager to know your opinions about this, thanks a lot.

eroller commented 5 years ago

Yes, somatic variations occur on top of some germline genome sequence so there is always the potential for the somatic variation to overlap a variant present in the germline genome or completely remove a germline variant. The rearrangement of a tumor sample can be so extensive that the tumor genome looks nothing like the germline genome.

However, an important part of somatic CNV calling is estimating the tumor purity of the sample. Knowing the germline variants can potentially improve the estimate of the tumor purity. It wouldn't be a straightforward normalization of the coverage signal, but there is probably a more complex model that could be used to improve the estimate of the tumor purity given a normal baseline sample.