Open standage opened 4 years ago
Refreshing myself with Wikipedia, and the following seemed relevant.
The interpretation of FST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and FST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data.
Microhaplotypes are certainly more polymorphic than SNPs. Most microhap markers are defined by 3-6 SNPs, but the markers with the highest Ae and In values are defined by dozens of SNPs. These most polymorphic markers (in green above) all have FST values near 0.
cc @rnmitchell
In your documentation you note that it is possible for FST values to be negative. How should I interpret this?
Hi @standage, sorry for slow response, in my limited understanding, the various Fst estimators (W&C, Hudson) can produce negative values, but negative values don't have any meaning for Fst, so negative values are usually clipped to zero.
My intuition for Fst is that it measures variance in allele frequencies between two populations. So in theory it shouldn't matter how many alleles are present at a locus. However, I haven't investigated how the different estimators behave in practice.
Hi, I am confused about the negative values of In (MH locus). Have you ever encountered negative values when calculating MH
Hi, thanks for making this library available!
I used the Weir/Cockerham formulation to compute FST values for a set of 412 markers (microhaplotypes) across 26 human population samples (from the 1000 Genomes Project). I'm now comparing these values to other measures of allelic variation I've computed previously: effective number of alleles (Ae) and Rosenberg's informativeness for assignment (In).
I have a pretty good intuitive understanding of Ae and In, but less so for FST. In your documentation you note that it is possible for FST values to be negative. How should I interpret this? There are a handful of outliers with extremely low FST values: these are all on the X chromosome. Is including these in the calculation problematic?
Some more background, if interested.