evotools / hapbin

Efficient program for calculating Extended Haplotype Homozygosity (EHH) and Integrated Haplotype Score (iHS)
GNU General Public License v3.0
41 stars 18 forks source link

Incorrectly documented output of ehhbin #61

Open rwaples opened 4 years ago

rwaples commented 4 years ago

Hello,

I have had some confusion interpreting the output of ehhbin. Specifically, which allele (0 vs 1) the ehh and ihh values apply to.

The README says: "ehhbin outputs five columns. The first three being the locus' ID and its genetic and physical positions. These are followed by two columns corresponding to the EHH for each of the alleles at this locus (allele coded as 0 then 1)."

There is no header generated for the file, but it seems to me that the last two columns are actually switched realtive to the documentation, so that the ehh for the haplotypes carrying the '1' are listed in the first of the two ehh columns.

I have attached small haplotypes and map files as an example. In this example, the haplotypes carrying the '1' are homozygous for longer.

If you remove the .txt extensions, I am calling ehhbin with:

ehhbin \ --hap ./test.hapbin.haplotypes \ --map ./test.map \ --binom \ --locus snp11

In final three summary lines ehhbin reports iHH_0 = 3, and iHH_1 = 13, which both make sense.
However the ehh values in the first ehh column (relating to allele '0' per the documentation) are larger and add up to 13, while the values in the second ehh column are small and add up three.

To me this suggests that the order of these two columns is switched relative to the description.

I have the most recent version of hapbin installed from github, running on Ubuntu 18.04

test.hapbin.haplotypes.txt

test.map.txt

Best, Ryan