bioXiaoheng / BallerMixPlus

This repository hosts the software package for BalLeRMix+, an extension of BalLeRMix that can jointly detect recent positive selection and long-term balancing selection.
MIT License
5 stars 1 forks source link

ballermix-ready-input-file produced by parsing script contains only one 'x' value. #4

Open jsokol94 opened 2 years ago

jsokol94 commented 2 years ago

Hi!

When I run the parsing script along with an AXT file, then the resulting ballermix-ready-input contains only a single 'x' value. I looked up your examples of 'ballermix-ready-inputs' in the repository when created using an axt file, and they also only have a single 'x' value, which may be because you are only running the analysis on a subset of variants. I am running mine on all called variants, and am still only getting a single 'x' value. The resulting spect file then only results in one line since the single value makes up the entire probability distribution.

Do you know what may be causing this?

Best, Jan

GiuliaFerraretti commented 1 year ago

Hi! I had the same identical problem after running the parse script and after trying to obtain the spectra file. I also used the same AXT file reported in your example (PAN alignment). Do you know how can I fix it? Thank you for your help!

The best, Giulia

bioXiaoheng commented 1 year ago

Hi @GiuliaFerraretti @jsokol94 , I apologize for my much-delayed follow-up! I made some fixes on the codes. Can you try them out to see if the issue still persists?

GiuliaFerraretti commented 1 year ago

Hi @bioXiaoheng,

thank you very much, really! Sorry for only replying now but I wanted to make sure that all the remaining steps of the analysis were completed before contacting you again (the chr22 is still running but the output seems super fine!). The parsing script now works perfectly, the problem seems solved. I would also like to take this opportunity to ask you a question about the interpretation of the results because I want to make sure I've understood correctly. Genomic regions conformed with a long-term balancing selection model should be characterized by: 1) elevated values (peaks) of the CLR statistic; 2) by value of x almost equal or equal to 0.5; 3) by positive values of log10 (a) (where a is denoted as s_hat in the output) and 4) by non-elevated values of A? I encounter difficulties especially in the interpretation of this last parameter. For positive selection signatures instead the only parameter that should change from the others is the log10 (a), which particularly should be negative indicating a depletion of sites with intermediate frequencies, right?

Thanks so much again! Best, Giulia

jsokol94 commented 4 months ago

@bioXiaoheng your changes fixed the issue. Sorry for the delayed reply, but I wanted to let you know!