Roth-Lab / pyclone-vi

Fast method for inferring cancer clonal population structure from SNV data.
GNU General Public License v3.0
50 stars 11 forks source link

Minimum number of mutations to use as input? #33

Closed THT-sleepy closed 1 year ago

THT-sleepy commented 1 year ago

Hello there:

I have some samples which only have about a dozen mutations and I am wondering if there is a minimun number under which pyclone-vi can't do a good job. In other words,should I exclude results generating by these samples for possible wrong estimation(lower) of number of clusters?

Best Huatao

aroth85 commented 1 year ago

That would be quite sparse data to resolve clones. It does depend on coverage though. In the past we have done targeted sequencing to 10,000x with dozens to hundreds of SNVs and gotten okay results. I would also consider whether this is single or multi sample data. If it is multi-sample you may be okay.

But for standard single sample WGS data, you probably won't identify more than one clone.

THT-sleepy commented 1 year ago

Thank you for your reply!

I have single sample WGS data seqencing to 60X, so I suppose that I should exclude samples bearing too few mutations.

By the way,for those samples,mutations are so few because we excluded mutations lying on non-diploid regions for simplicity and those samples all have extensively CNVs across all chromosomes.

Thanks again!

aroth85 commented 1 year ago

In that case I would include the SNVs in CNV regions. One of the key features of the PyClone model is correcting for CNV effects.

THT-sleepy commented 1 year ago

Thank you,

But we only have total cn data. If I set major_cn to total_cn, minor_cn as 0, I think I can't get accurate CCF estimation of SNVs in CNV regions because pyclone-vi would consider SNVs happen after CNVs and will always allow one mutation to major allele in this case as you said in a previous issue. As multiplicity(mutation copies ) will be affected by CNV and CCF will be affected by multiplicity, I think pyclone-vi can't give right CCF estimation for SNVs in CNV regions.

In addition, we're also testing another tool called Pairtree whose author also excluded SNVs in CNV regions in his lately paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7122013/) and we are trying to keep preprocess on SNVs coincident.

To avoid misunderstanding, we compare the results of both tools only to get more solid results, comparing tools for clustering mutations isn't our mainly work.

Thanks again!

THT-sleepy commented 1 year ago

Thank you,

But we only have total cn data. If I set major_cn to total_cn, minor_cn as 0, I think I can't get accurate CCF estimation of SNVs in CNV regions because pyclone-vi would consider SNVs happen after CNVs and will always allow one mutation to major allele in this case as you said in a previous issue. As multiplicity(mutation copies ) will be affected by CNV and CCF will be affected by multiplicity, I think pyclone-vi can't give right CCF estimation for SNVs in CNV regions.

In addition, we're also testing another tool called Pairtree whose author also excluded SNVs in CNV regions in his lately paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7122013 IF: 28.2 Q1 B1/) and we are trying to keep preprocess on SNVs coincident.

To avoid misunderstanding, we compare the results of both tools only to get more solid results, comparing tools for clustering mutations isn't our mainly work.

Thanks again!

I'm sorry that I have misunderstanded how Pyclone-vi will calculate multiplicity of the mutation when provided with total copy number data(set major_cn = total_cn,minor_cn = 0). It will correct for CNV effects rather than always allowing one mutation to major allele.I hope the previous comment didn't misguide anyone.