JiangYuLab / CNVcaller

GNU General Public License v3.0
41 stars 14 forks source link

About genotype&filtering #12

Open MaoSihong opened 3 years ago

MaoSihong commented 3 years ago

how can I comprehend the genotype like 0/1|1/1|0/2|2/3…and how about the A、B、C in the tsv result? how can I filter the sites by SILHOUETTESCORE & CALINSKIHARABAZESCORE&LOGLIKELIHOOD, are there any recommanded threshholds? how to define the heterozygousity of a CNVsite for single individual?

YiBenqiu commented 3 years ago

In my understanding, 0/0 means no copy numbers. if SVTYPE = DEL and genotype is 0/1 that means heterozygous deletion, 1/1 means homozygous deletion. if SVTYPE = DUP and genotype is 0/1 that means heterozygous duplication, 1/1 means homozygous duplication. But when CN>4, these are multi-allelic CNVs, so the genotype may look like 1/2, 0/2...

In additonal, I'm also confused about how to filter the sites by SILHOUETTESCORE & CALINSKIHARABAZESCORE&LOGLIKELIHOOD, in my result, some CALINSKIHARABAZESCORE or SILHOUETTESCORE value is nan. Does the same thing happen in your results?

MaoSihong commented 3 years ago

In my understanding, 0/0 means no copy numbers. if SVTYPE = DEL and genotype is 0/1 that means heterozygous deletion, 1/1 means homozygous deletion. if SVTYPE = DUP and genotype is 0/1 that means heterozygous duplication, 1/1 means homozygous duplication. But when CN>4, these are multi-allelic CNVs, so the genotype may look like 1/2, 0/2...

In additonal, I'm also confused about how to filter the sites by SILHOUETTESCORE & CALINSKIHARABAZESCORE&LOGLIKELIHOOD, in my result, some CALINSKIHARABAZESCORE or SILHOUETTESCORE value is nan. Does the same thing happen in your results?

Of course,NAN values also emerge in my results. I my sight, SILHOUETTESCORE &CALINSKIHARABAZESCORE describe the quality of genotype classification. Besides filtering the nan sites, keep x persentile of sites with higher value of these two values maybe a method. But it may omit some information and seems not quiet reasonable. LOGLIKELIHOOD seems like a transformed probability of the genotype of this site, I think.