bioXiaoheng / BalLeRMix

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)
4 stars 1 forks source link

LR score in two different populations #10

Open scwwxc opened 1 year ago

scwwxc commented 1 year ago

Hi Xiaoheng, I'v run BalLeRMix in two divergent populations (PCA analysis showing significant different) from the same species, and use their own minor allele frequency file to construct help file seperately. The help file in these two population seems significant different. I then run B2maf for these two population. The LR score seems significant different in these two populations. For pop1, positions with the most significant LR scores show similar frequency (0.5, 0.5) in the two alleles. However, in pop2, positions with the most significant LR score showes different frequency (0.65, 0.35) in the two alleles. Is this a reasonable result for pop2? Should I combined the MAF files from the two pop to construct help file, and then use this help file to run B2maf for these two populations. Thanks!

bioXiaoheng commented 1 year ago

Hi!

It is normal for LR distributions to be different between divergent populations. The value cannot be used to compare across the population, though, because they are assuming different null; this is especially so given their MAF spectra are significantly different. Also because the two populations are divergent, it wouldn't be unreasonable if their expected equilibrium frequencies vary. Is it important for them to have the same equilibrium frequency? If so, it might be interesting to examine the MAF spectra of the genomic region centering on your highest-scoring position; if they appear multi-modal, you could try running the analyses again assuming >2 balanced alleles (e.g. with -m 3 flag).

Hope that helps!