bioXiaoheng / BalLeRMix

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)
4 stars 1 forks source link

Error while outputting the outfile #2

Closed arsthilaire closed 4 years ago

arsthilaire commented 4 years ago

Hi, I've run the B2 statistic on a subset of chromosome 19 (~430kb) and it ran perfectly. Then, I've been trying to run the B2 statistic on the entire chromosome 19, but I'm getting an error. I'm running BalLeRMix with the v2 of the script and python 2.7.

The command line used was:

python bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py -i input/chr19_YRI_0.05.input --spect chr19_YRI_0.05.b2.sfs -o chr19_YRI_0.05.out

And here's the log:

Optimizing over x= (0.5,0.5); (0.55,0.45); (0.6,0.4); (0.65,0.35); (0.7,0.3); (0.75,0.25); (0.8,0.2); (0.85,0.15); (0.9,0.1); (0.95,0.05).

Optimizing over A= 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10000, 1000000.0, 1000000000.0.

2020-02-20 09:52:55.326318. Initializing...
Total prob is 1.000000 in the spectrum file.
Sample size: 216
Using derived allele frequencies...
Skipping (k=1,N=216) in full model for X=(0.5,0.5), alpha:0 , fx: 0, hx: 2.05104e-63, gx:0
Skipping (k=1,N=216) in full model for X=(0.55,0.45), alpha:0 , fx: 0, hx: 7.32174e-55, gx:0
Skipping (k=1,N=216) in full model for X=(0.6,0.4), alpha:0 , fx: 0, hx: 8.66967e-47, gx:0
Skipping (k=1,N=216) in full model for X=(0.65,0.35), alpha:0 , fx: 0, hx: 2.25873e-39, gx:0
Skipping (k=1,N=216) in full model for X=(0.7,0.3), alpha:0 , fx: 0, hx: 1.60925e-32, gx:0
Skipping (k=1,N=216) in full model for X=(0.75,0.25), alpha:0 , fx: 0, hx: 3.71138e-26, gx:0
Skipping (k=1,N=216) in full model for X=(0.8,0.2), alpha:0 , fx: 0, hx: 3.15356e-20, gx:0
Skipping (k=1,N=216) in full model for X=(0.85,0.15), alpha:0 , fx: 0, hx: 1.08289e-14, gx:0
Skipping (k=1,N=216) in full model for X=(0.9,0.1), alpha:0 , fx: 0, hx: 1.56878e-09, gx:0
Skipping (k=1,N=216) in full model for X=(0.95,0.05), alpha:0 , fx: 0, hx: 8.76945e-05, gx:0

2020-02-20 09:53:08.020534. Reading input file: input/chr19_YRI_0.05.input

2020-02-20 09:53:08.359589. Start computing likelihood ratios...
Computing LR on every 1 site/s, using all the data with alpha >= 1e-8.
writing output to chr19_YRI_0.05.out
Traceback (most recent call last):
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py", line 785, in <module>
    main()
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py", line 777, in main
    scan(xGrid,AGrid,AdGrid,aGrid,opt.outfile,phys_pos,pos_list,Ks,Ns,numSites,Spec, logSpec, Fx,N, opt.size,opt.r,opt.step,opt.phys,opt.nofreq,opt.MAF,opt.noCenter,opt.Rrate)#, ProbsStar
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py", line 649, in scan
    scan_alpha(xGrid,AGrid,AdGrid,aGrid,outfile,phys_pos,pos_list,Ks,Ns,numSites,Spec,logSpec,Fx,N,s,MAF)
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py", line 616, in scan_alpha
    Tmax,winSites = calcBaller(pos_list, Ks, Ns, testSite, int(i), logSpec, Fx, xGrid, AGrid, AdGrid, aGrid, MAF)
  File "/project/6005588/shared/bin/BalLeRMix_v2/BalLeRMix/software/BalLeRMix_v2.py", line 527, in calcBaller
    return(Tmax,winSize)
UnboundLocalError: local variable 'winSize' referenced before assignment

I'm running it without the option w. I've also tried with the option but I'm still getting the error. Any ideas on why I'm getting this error ?

Also, just to confirm, is the LR column in the output the B2 statistic?

Thanks for your help, Alex

bioXiaoheng commented 4 years ago

Hi Alex,

Thanks for reporting the error. It looks like on that particular test site the program didn't find any parameter combo to make the LR>-100, which is weird because it should at least go to zero. Do you mind sending me your input files?

And yes, the LR column is the value of the B statistic of your choice.

Thanks, Xiaoheng

bioXiaoheng commented 4 years ago

Hi Alex,

Thanks for sending me the files. It looks like you have 210 consecutive sites sharing the same genetic position; is that an artifact?

I've added the necessary variable initialization for this part in my code. Let me know whether this issue is resolved after reparsing your input.

Best, Xiaoheng

arsthilaire commented 4 years ago

Hi @bioXiaoheng ,

I've corrected my input files and it's working now.

Thank you for resolving it so quickly! Alex

bioXiaoheng commented 4 years ago

Great! Thanks again for reporting the issue.