PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

Potential duplication of code for merge_breakpoints #77

Open ArtPoon opened 3 months ago

ArtPoon commented 3 months ago

These functions seem to be identical: https://github.com/PoonLab/OpenRDP/blob/1d1c29969c5382f3c249ad221c8bee9f9e0bcab8/openrdp/threeseq.py#L83 https://github.com/PoonLab/OpenRDP/blob/1d1c29969c5382f3c249ad221c8bee9f9e0bcab8/openrdp/chimaera.py#L222

merge_breakpoints also appears in siscan.py, rdp.py, maxchi.py and bootscan.py. These can probably be moved into common.py. Also, it looks like we've done some modifications to merge_breakpoints in bootscan.py - if this was an optimization and the outputs are the same then we should migrate those changes to the common function.

WilliamZekaiWang commented 4 weeks ago

The merge_breakpoints function seems to be slightly modified in all of the scripts in terms of changing the data structures and sorting the breakpoints. I've merged those with no issues into one common function into common.py

However what is different is that a few of the functions had a filter for p-values (siscan and bootscan), the others didn't.

I've made it so that the function in common.py has a p-value filter that has a base value set to a big number so that it won't cut anything out.

In terms of Bootscan, it seems like using its version of merge_breakpoints on other methods gave different outputs, so I'm assuming it's doing more than optimization. I've kept Bootscan's verison of this function and had all other methods use the one in common

ArtPoon commented 4 weeks ago

We need to analyze these different versions and identify where they are different and what the effects of those changes are - and more importantly whether those differences are relevant to a specific method. If not, then we should harmonize this functionality across methods.