Open deboraycb opened 1 year ago
(sorry, forgot to add something to my previous message)
I also found it quite surprising that the site with LR=0 (which I thought meant LR < -100
) is only 102 bp away from a site with LR=3.98, so I don't think I really understand what those LR=0 lines in the output mean. Any help is much appreciated! Thanks!
physPos genPos LR xhat Ahat numSites
10000033 33.39164 0.000000000 0.95,0.05 1.00000e+16 0
10000135 33.39247 3.981366165 0.85,0.15 3.16228e+04 22
Hi Debora,
Nice that you noticed this behavior! In fact, it's entirely expected that the lowest value of (maximized) LR be 0, because it's the (log) ratio of two (composite) likelihoods (from alternative & null hypothesis, respectively) that are both included in the model. In other words, when we search the parameter space to maximize the LR, we know for sure there at least exists one fixed value of LR=0 when the alternative model takes the same parameter as the null. So rest assured that it's nothing abnormal =). You may have noticed negative LRs from a previous version because its parameter space wasn't optimized.
Hope that helps!
Best, Xiaoheng
Hi Xiaoheng,
I am coming back to this after a long break. Thank you for your reply!
So, from your response and from looking at the calcBaller function, I understood that those lines with LR=0 and numSites=0 are cases where there was no LR>0 with any of the parameter combinations tested. Therefore, I should just interpret those as reliable sites having no evidence of balancing selection. Is that correct?
However, I was still puzzled by
1) the fact that I have lots of line like this (LR=0,numsites=0) in my output (about 50%), and 2) some cases where a site with LR=0,numsites=0 was surrounded by very close sites with LR>0, such as this (note these sites are separated by only 12bp):
physPos genPos LR xhat Ahat numSites
9921419 33.27157287885831 17.978688147283947 0.75,0.25 31622.8 142
9921429 33.27158812465831 0 0.95,0.05 1e+16 0
9921431 33.27159117381831 8.315634381664495 0.75,0.25 31622.8 143
It is very possible that I am misunderstanding something about how Ballermix works, but I thought it was weird that LR>0 was found for the first and last sites, but not for the middle one, given that they are so close. Does that surprise you too?
Thanks for all your help! Best, Debora
Hi Xiaoheng,
I don't know if you had time to look at my previous question, but I have explored that issue a bit more in the mean time, and I'll include my findings here - although I'm not completely sure how to interpret them.
I have printed out the optimized windows for the sites I mentioned above:
physPos genPos LR xhat Ahat numSites winStart winEnd
9921419 33.27157287885831 17.978688147283947 0.75,0.25 31622.8 142 9921036 9921802
9921429 33.27158812465831 0 0.95,0.05 1e+16 0 - -
9921431 33.27159117381831 8.315634381664495 0.75,0.25 31622.8 143 9921048 9921819
and I also looked at the input lines for those sites:
physPos genPos x n
9921419 33.27157287885831 8 196
9921429 33.27158812465831 144 196
9921431 33.27159117381831 1 196
and I plotted the SFS for the optimized windows for sites ending in 19 and 31:
I am still puzzled by the fact that the first site (position ending in 19) and third site (position ending in 31) have very much overlapping windows, which also overlap with the middle site (ending in 29), and that Ballermix finds a LR of 17 and 8 for both sites, but does not find any LR>0 for the middle site. Is that behaviour expected? As I said before, I get lots of sites in my results with such LR=0 lines (around 50%), and I just want to make sure these results are valid.
Thanks again! Best, Débora
I am attaching here the ballermix input and output files for this example, as well as the command I ran:
python3.10 BalLeRMix/software/BalLeRMix_v2.5.py -i chr2L_10kbaround9921kb.balmixder.minrec.txt -o chr2L_10kbaround9921kb_w1000.B2.txt --spect Zambia_concat.B2spect_DAF.txt -w 1000
Zambia_concat.B2spect_DAF.txt chr2L_10kbaround9921kb.balmixder.minrec.txt chr2L_10kbaround9921kb_w1000.B2.txt
Hi Xiaoheng,
I have quite a few lines on my output that look like this:
I thought this was because of the LR threshold of -100, but I changed it to -1000 and still get the same results. How should I interpret those? Should results from these SNPs be excluded because they indicate some problem in the data or should they count as evidence of no balancing selection at those sites?
Thank you! Debora