alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

Different reports after adding the "-R" flag #8

Closed sergiopalmavera closed 4 years ago

sergiopalmavera commented 5 years ago

Hello! I am starting to learn to use RAiSD. I see a strange behaviour of the tool when adding the -R flag and running it on the same VCF file.

Here you can see a sample of the reports:

RAiSD -n test_run1 -I vcf

// 1
5527316 5.212e+01
5677179 5.830e+01
5697510 5.799e+01
5755092 5.747e+01
5771810 5.491e+01
5894364 6.105e+01
6038580 7.030e+01
6116586 8.723e+01
6191527 8.352e+01

RAiSD -n test_run2 -I vcf -R

// 1
5527316 3211884 7842748 3.882e+01       1.135e+00       1.778e+00       7.835e+01
5677179 3504401 7849957 3.643e+01       1.135e+00       1.485e+00       6.141e+01
5697510 3525782 7869239 3.641e+01       1.135e+00       1.488e+00       6.151e+01
5755092 3637645 7872540 3.550e+01       1.135e+00       1.811e+00       7.299e+01
5771810 3668628 7874991 3.526e+01       1.064e+00       1.591e+00       5.972e+01
5894364 3894955 7893772 3.352e+01       9.934e-01       2.262e+00       7.534e+01
6038580 4139154 7938005 3.185e+01       9.934e-01       2.672e+00       8.451e+01
6116586 4208634 8024539 3.199e+01       9.934e-01       3.239e+00       1.029e+02
6191527 4331928 8051126 3.118e+01       9.224e-01       3.251e+00       9.351e+01

As described in the documentation the last columns of each report format contain the same values (μ statistics), so both columns should be the same.

I corroborated this assumption using the test run file provided in the documentation:

RAiSD -n test_run1 -I d1/msselection1.out -L 100000

// 0
430     2.792e+00
445     2.607e+00
450     2.198e+00
455     2.482e+00
460     2.415e+00
480     2.502e+00
490     1.897e+00
500     1.278e+00
515     1.108e+00

RAiSD -n test_run2 -I d1/msselection1.out -L 100000 -R

// 0
430     20      840     1.033e+00       9.934e-01       2.720e+00       2.792e+00
445     20      870     1.071e+00       9.224e-01       2.639e+00       2.607e+00
450     20      880     1.084e+00       9.224e-01       2.199e+00       2.198e+00
455     20      890     1.096e+00       9.934e-01       2.280e+00       2.482e+00
460     20      900     1.109e+00       9.934e-01       2.192e+00       2.415e+00
480     30      930     1.134e+00       9.934e-01       2.221e+00       2.502e+00
490     30      950     1.159e+00       9.224e-01       1.774e+00       1.897e+00
500     40      960     1.159e+00       9.224e-01       1.195e+00       1.278e+00
515     50      980     1.172e+00       9.934e-01       9.519e-01       1.108e+00

What could cause such discrepancy between report formats?

Thanks!

alachins commented 5 years ago

Hello Sergio, This is most probably due to the fact that you did not provide a seed for the random number generator. Although optional, it is recommended when using VCF files in order to reproduce results. Please let me know if the observed discrepancy remains after providing a seed, e.g., -a 123. Regards, Nikos

sergiopalmavera commented 5 years ago

Hello Nikos yes! ... setting the same seed for both runs produced the same μ statistics.

Thanks!