bioXiaoheng / BalLeRMix

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)
4 stars 1 forks source link

No outfile produced #1

Closed mhapp95 closed 4 years ago

mhapp95 commented 4 years ago

I've been trying to work with BalLeRMix the last couple days and have been running into issues with getting the B0 statistics. I am working with v2 of the script and python 2.7. I am receiving the following output.

python BalLeRMix/BalLeRMix/software/BalLeRMix_v2.py -i Chr1.input.txt -o Chr1.outfile.txt --nosub --MAF --spect Spect_nosub.txt

Optimizing over x= (0.5,0.5); (0.55,0.45); (0.6,0.4); (0.65,0.35); (0.7,0.3); (0.75,0.25); (0.8,0.2); (0.85,0.15); (0.9,0.1); (0.95,0.05).

Optimizing over A= 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 1000000.0, 1000000000.0.

2020-02-05 10:50:05.275592. Initializing...
Total prob is 1.000000 in the spectrum file.
Sample size: 426
Using minor allele frequencies...
...without substitutions
(k,N): 1,426, alpha:0 , fx: 0, hx: 4.91656e-126, gx:0 

No output file is being produced from this. Any ideas?

I am using python BalLeRMix/BalLeRMix/software/BalLeRMix_v1.2/BalLeRMix.py -i Concat.input.txt --getSpect --spect Spect_nosub.txt --nosub --MAF to generate the helper file.

mhapp95 commented 4 years ago

Looking into this a little further myself, it appears to be a Value Error in python, stemming from attempting to perform log(fx) where the value of fx is 0, though I haven't been able to track down through the calculations what about the data would be causing the value of fx to be 0.

bioXiaoheng commented 4 years ago

Hi mhapp95, Thanks for reaching out! Do you mind sharing your input files? I'll try to replicate the error and troubleshoot.

Best, Xiaoheng

mhapp95 commented 4 years ago

No problem, I just sent them to your Penn State university email!

Mary

jcgrenier commented 4 years ago

Hello @bioXiaoheng,

I'm running into the same kind of issues. Using v1.1, I was having a problem with a log calculation : biFx[x][a][k] = log(bifx)

The command lines used were :

bin/BalLeRMix_v1.1/BalLeRMix.py -i ../chr19/chr19_YRI_0.05.input --getSpect --spect chr19_YRI_0.05.b2.sfs
bin/BalLeRMix_v1.1/BalLeRMix.py -i CYP4F_YRI_0.05.input --spect chr19_YRI_0.05.b2.sfs -o test.out

I tried it using v1.2, gives the same thing. Here's the log :

Optimizing over x= 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5
Optimizing over A= 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 1000000.0, 100000000.0

2020-02-05 15:48:49.641057. Initializing...
Total prob is 1.000000 in the spectrum file.
Sample size: 216
Traceback (most recent call last):
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v1.2/BalLeRMix.py", line 585, in <module>
    main()
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v1.2/BalLeRMix.py", line 569, in main
    (Spec, logSpec, biFx,N) = initialize(opt.spectfile,xGrid,aGrid,opt.MAF,opt.nofreq,opt.nosub)#initialize(spectfile,xGrid,aGrid,MAF,nofreq,nosub)
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v1.2/BalLeRMix.py", line 309, in initialize
    biFx[x][a][k] = log(bifx)
ValueError: math domain error

Now, with the newest version, I'm not getting any error with the newest version, but no output at all for the second command line. Here's the log for v2 :

$ python /project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py -i ../chr19/chr19_YRI_0.05.input --spect chr19_YRI_0.05.b2.v2.sfs -o test.>

Optimizing over x= (0.5,0.5); (0.55,0.45); (0.6,0.4); (0.65,0.35); (0.7,0.3); (0.75,0.25); (0.8,0.2); (0.85,0.15); (0.9,0.1); (0.95,0.05).

Optimizing over A= 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 1000000.0, 1000000000.0.

2020-02-05 15:51:33.065192. Initializing...
Total prob is 1.000000 in the spectrum file.
Sample size: 216
Using derived allele frequencies...
(k,N): 1,216, alpha:0 , fx: 0, hx: 2.05104e-63, gx:0

Will be happy to provide you some examples as well. Thanks for your help.

JC

bioXiaoheng commented 4 years ago

Hi Jean, Mary,

Thanks for letting me know about this issue! I got the input files from Mary and managed to replicate the error.

It turns out the current script assumes that all possible allelic count ("k") have non-zero representation in the data and runs into log(0) error when any of them is missing. In Mary's data, all singletons seem to have been removed, and @jcgrenier I'm assuming your data have similar zero-representation allele frequency too.

I'll patch up the bug and update the script soon. Thank you so much for reporting this issue!

Best, Xiaoheng

bioXiaoheng commented 4 years ago

@mhapp95 @jcgrenier I fixed the bug. Please try again to see if things run smoothly. Thanks again for reporting it!

jcgrenier commented 4 years ago

Hi @bioXiaoheng , it works well now! Fantastic!

Thanks a lot for resolving this so quickly! JC

bioXiaoheng commented 4 years ago

@jcgrenier No problem at all! It's an easy fix, and the "sys.exit()" was something I added during the initial debugging and forgot to remove. Thank you and @mhapp95 for catching this bug!