A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.
When I'm calculating genetic distance between samples using plink, I found the result is weird to me.
The VCF file I used is :
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 NA0004
20 1 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 0/0:40:3 0/0:40:3
20 2 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 0/0:40:3 0/0:40:3
20 3 . A G,C 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 1/1:40:3 2/2:40:3
20 4 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 0/0:40:3 0/0:40:3
20 5 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 0/0:40:3 0/0:40:3
20 6 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 0/0:35:4 0/0:17:2 0/0:40:3 0/0:40:3
20 7 . A G 50 PASS NS=3;DP=9;AA=A GT:GQ:DP 1/1:35:4 0/0:17:2 0/0:40:3 0/0:40:3
The result was:
0 2 4 2.33333
2 0 2 0
4 2 0 0
2.33333 0 0 0
What makes me confused is that why the distance between NA0004 and NA00001 is 2.3333 rather than 4, and the distance between NA0004 and NA00003 is 0 instead of 2?
Hi,
When I'm calculating genetic distance between samples using plink, I found the result is weird to me.
The VCF file I used is :
The result was:
What makes me confused is that why the distance between NA0004 and NA00001 is 2.3333 rather than 4, and the distance between NA0004 and NA00003 is 0 instead of 2?
The command I used is:
And my plink version is
v1.90b6.10 64-bit (17 Jun 2019)
Could you please explain more details about the algorithm in calculating distance to me? Thanks so much!