abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
178 stars 26 forks source link

obtaining integer CN estimates from non-diploid chromosomes #208

Open 7ehuang opened 7 months ago

7ehuang commented 7 months ago

Thank you for the active maintenance of this software! I read in issue #81 and #88 that I can obtain copy number estimates by multiplying the "CNV level" column by 2 and rounding to the nearest integer. However, would this still be the case when analyzing data from cells with aneuploidy (i.e. K562, which is predominantly triploid)? Because then in these cases, the majority of the genome is not necessarily diploid.

I ask because I am comparing my CNVpytor calls with the results from this paper, and the CN calls are highly discordant.

arpanda commented 7 months ago

We have an example for the K562 in the cnvpytor manuscript (supplymentary figure S1). In that figure, the normalization is applied based on the bimodal hypothesis (sum of two gaussion distributions with ratio of mean values equal to 2, related to copy numbers 2 and 4 ).

The implementation is hard coded and let you know once we add a user option to enable it through command line.

If you want to unlock option by changing the code, you can remove False and in line of file root.py, under implementation of the caller function you are using :

bim = fit_bimodal(bins[:-1], hist)
if False and bim is not None:

Thank you, Arijit

7ehuang commented 7 months ago

We have an example for the K562 in the cnvpytor manuscript (supplymentary figure S1). In that figure, the normalization is applied based on the bimodal hypothesis (sum of two gaussion distributions with ratio of mean values equal to 2, related to copy numbers 2 and 4 ).

The implementation is hard coded and let you know once we add a user option to enable it through command line.

If you want to unlock option by changing the code, you can remove False and in line of file root.py, under implementation of the caller function you are using :

bim = fit_bimodal(bins[:-1], hist)
if False and bim is not None:

Thank you, Arijit

Thank you for your reply! Does this mean that by default (i.e. without making the changes to the code that you suggested), the normalization is not based on the bimodal hypothesis?

Just to clarify, are the results from the manuscript based on the default mean-shift caller or the joint caller?

arpanda commented 7 months ago

Yes, the default is not set for the bimodal hypothesis as you can see the false is hard coded in the code.

The mean shift caller has the limitation that its based on read depth only. For the supplementary figure S1, the normalization is based on joint caller.

-Arijit