PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

RDP gives P-value greater than 1 #93

Open WilliamZekaiWang opened 3 weeks ago

WilliamZekaiWang commented 3 weeks ago

Using the dev branch where there is no P-value filter for breakpoints, as well as the test_cfg.ini gives the following results.

> openrdp tests/long.fasta -c tests/test_cfg.ini

Method          Start   End     Recombinant     Parent1 Parent2 Pvalue
------------------------------------------------------------------------
Geneconv        1       204     Test2           Test3   -       2.00E-05
Geneconv        151     195     Test1           Test3   -       2.10E-03
Geneconv        203     507     Test1           Test2   -       8.29E-03
Geneconv        539     759     Test1           Test2   -       1.54E-01
Geneconv        151     193     Test4           -       -       2.20E-02
Geneconv        56      170     Test1           -       -       2.73E-02
Bootscan        760     765     Test2           Test1   Test3   6.51E-02
MaxChi          475     518     Test1           Test2   Test3   4.04E-02
MaxChi          475     518     Test1           Test2   Test4   4.04E-02
MaxChi          439     482     Test1           Test3   Test4   4.04E-02
Siscan          2       45      Test1           Test2   Test3   7.49E-01
Siscan          2       45      Test1           Test2   Test4   7.62E-01
Siscan          2       45      Test1           Test3   Test4   7.64E-01
Siscan          2       45      Test2           Test3   Test4   7.64E-01
Chimaera        198     241     Test1           Test3   Test4   2.05E-02
Chimaera        170     213     Test2           Test1   Test4   1.81E-03
Chimaera        176     219     Test2           Test3   Test4   4.70E-03
3Seq            181     787     Test2           Test3   Test4   5.29E-06
3Seq            202     787     Test3           Test1   Test2   5.98E-10
RDP             6       15      Test1           Test2   Test3   3.12E+01
RDP             6       504     Test1           Test3   Test4   6.87E-07
RDP             36      481     Test2           Test3   Test4   1.67E-05

I tinkered with this further, and this only occurs when we use this specific config file. The default one doesn't cause this, nor does providing a reference file

This occurs regardless if I use MPI or not

ArtPoon commented 3 weeks ago

This seems to be where p-values are being calculated for the RDP method: https://github.com/PoonLab/OpenRDP/blob/bb0200092d6245cfd11784a1e7fd081a59507de1/openrdp/rdp.py#L165-L175

ArtPoon commented 3 weeks ago

On line 172, we should be taking the power out of $\log(p^n) = n\log(p)$ when calculating the binomial probability Also need to look at how G is being calculated as a possible issue