StoreyLab / popkin

R package to estimate kinship and FST from SNP data
19 stars 0 forks source link

Fst predominant negative values #4

Closed pidita closed 1 year ago

pidita commented 2 years ago

I am analysing my SNPs dataset of a plant species following the tutorial. Pairwise Fst results in nearly half of estimates as negative values. I read other post about this subject, but with a few negative Fst. I understand that individuals from a same population are probably close relatives and negative values can be expected, but such negative values are also found among individuals from different populations. I know this is not an issue, but I am wondering how should I interpret my results.

image

Best wishes.

alexviiia commented 2 years ago

This is very unusual indeed! The formula allows negative values to occur, but they should be rare/small under both HWE and between differentiated populations. As I mentioned in the other issue, this would also be expected between individuals that are very closely related (like siblings, first cousins, etc), but I agree that explanation doesn't fit here.

My only idea is that it could be a locus ascertainment issue. By that, I mean I don't see this in simulations where no loci are removed, and also in real data that is only very lightly filtered, but I've seen weird things happen when stringent MAF filters are applied, for example, or when only loci that are variable in a given reference population are used. This is tricky because sometimes a genotyping array has its own ascertainment bias that you're just stuck with (that its set of SNPs come from another database enriched for variants from a certain population that is not your study population). Sorry this is a bit handwavy, it's the best I can do without actually looking at your data.