Closed rachitasrivastava closed 1 month ago
1 VCF2Dis is the software to calculate p-distance matrix ,and p-distance is different with genetic distance. I thinks. 2 If one of the two samples genotype is missed, it will not participate in the calculation 3 the L is the Number of Pairwise comparison. for Example ,10 sites
sample1: A A A A - - A A A A sample2: A A A M A - A C A A DiffA: 0 0 0 0.5 - - 0 1 0 0 sum 1.5 Diff(1_2) is 1.5 VarL : 1 1 1 1 0 0 1 1 1 1 sum 8 L(1_2) is 8
finally p_dis(1_2)= 1.5/8
I ran the following command:
/VCF2Dis-master/bin/VCF2Dis -InPut input.vcf.gz -OutPut output.mat
Since I generated this matrix for which a dataset for which pairwise genetic distance values were already available, I compared the results of VCF2Dis to the available results. It seems like the results differ by a factor of 10. When I divide the result values of VCF2Dis by 10, the results match to the already available dataset. Can you please explain -
What happens to the sites with missing data in one sample of the two samples in a pair
What is L in your formula? Is it the complete genome or just the sites which are considered for calculating genetic distance