adimitromanolakis / sim1000G

Simulation of rare and common variants based on 1000 genomes data
17 stars 1 forks source link

Phasing genotype by readVCF #12

Open zhangbs92 opened 3 years ago

zhangbs92 commented 3 years ago

Hi all,

I checked the source code of function readVCF, I found that it generates gt1 and gt1 by simply selecting the first number of allele as the first haplotype and the second number of allele as the second haplotype.

image

The problem is, when the genotype is 0/1 or 1/0, it is always 0/1 in the VCF file. So I assume this is not there should be something when using this haplotype information to estimate LD, isn't it? What if I add some randomness in the phasing process, say, randomly select some positions to extract the second number as the first haplotype.

Best, Bingsong