bioXiaoheng / BalLeRMix

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)
4 stars 1 forks source link

calculate a_hat, dispersion parameter #13

Open GeorgeGkafas opened 2 months ago

GeorgeGkafas commented 2 months ago

Dear Xiao,

Thank you very much for developing Ballermix. I used the b2 analysis per different Contigs of my data. First I parse my VCF files using the default rec_rate and then following manual's steps I created the Site-Frequency-Specturm file. The final script was the following

python BalLeRMix+_v1.py -i chr{i}_ballermix.txt --spect b2_spect-o chr{i}_B2scores.txt

I was wondering how I can calculate the a_hat dispersion parameter, as it's not present in the output file, or what I can tell about my results. Here is a subset of my output file

genPos CLR x_hat s_hat A_hat nSites 0.0416375 3.4479299733187645 0.5 1000000000.0 10000 46 0.45881125000000006 6.3762486099960825e-06 0.4 1000000000.0 1000000.0 1 0.03548375 0.0 0.0 0.0 0.0 0.0 0.021341250000000003 0.002888251337004988 0.05 1000000.0 1000000.0 3 0.31978875 0.0 0.0 0.0 0.0 0.0 0.13466250000000002 3.356456799193211 0.5 1000000000.0 6000 71 0.37633000000000005 1.7426574919419409 0.5 100 4000 102

I understand that positive CLRs values are subject to balancing selection. But how can I tell if it's significant or not? Do you think a simple t-test, to test for significant non-zero values would answer my question?

Also, how can I interpret the selection coefficient's values?

Thanks best, George