Roth-Lab / pyclone-vi

Fast method for inferring cancer clonal population structure from SNV data.
GNU General Public License v3.0
50 stars 11 forks source link

Extracting additional model parameters #20

Closed wir963 closed 2 years ago

wir963 commented 3 years ago

Hey @aroth85,

I'm interested in extracting the copy number for a specific mutation x. For simplicity, let's assume that mutation x is clonal. If the region including mutation x has total copy number = 5 (major CN = 3, minor CN = 2), then mutation x could theoretically occur 1-5 times per cell (I also think it would be reasonable to argue that mutation x could theoretically occur 1-3 times because the major CN = 3 but this is not important to my question). I would like to know PyClone-VI's estimate of how many copies of mutation x occurs in each cell?

With the caveat that I have not gone through the math, I assume that PyClone-VI estimates this quantity for each mutation. Am I correct in this assumption? If so, how can I extract a point estimate for this value from the model?

Best, Welles

aroth85 commented 3 years ago

Hi #@Welles,

We actually marginalize (sum over all possibilities) the genotype, rather than estimating a fixed value. This allows the method to account for uncertainty in the genotype. The best explanation is in the supplemental material of the original PyClone paper.

You could in theory compute the probability of the different genotypes post-hoc conditioned on the cancer cell fraction (CCF) of the mutation. Basically you fit the model and then post-process by cycling through all genotypes for a mutation, computing the probability of the observed read counts given the CCF inferred and then normalize.

Cheers, Andy

wir963 commented 2 years ago

Okay, that makes sense! Thanks @aroth85! I may have some further questions about how to extract the model parameters, etc. but this is a good start