Open ArtPoon opened 2 months ago
Bootscan
, so we can use this as a templateTripletGenerator
at an upper level and iterating through different methods in an inner loopI think this is the right entry point for parallel computing: https://github.com/PoonLab/OpenRDP/blob/bb0200092d6245cfd11784a1e7fd081a59507de1/openrdp/__init__.py#L270-L278
After parallel processing, use gather
to collect results for one process; for example:
https://github.com/PoonLab/covizu/blob/12a9d08b70d754a30b72d84636affb28d3934765/covizu/clustering.py#L325-L332
I think I've done it in the new iss85 branch:
https://github.com/PoonLab/OpenRDP/blob/4645e4764d2d47fde957cccc1d944ac7ca36db9c/openrdp/__init__.py#L276-L300
In regards to gather
, I'm not too sure how to apply it in this scenario? The tmethod.execute(triplet)
doesn't return anything as explicitly as the bootstrap example in the covizu.
I tested openrdp on long.fasta
under the test directory on both my changes and master branch, and they both gave the same output.
I think I have now correctly implemented the gather
, although I'm not sure my implementation is the most efficient. I made it so it does merge_breakpoints()
in each process, then gathers them afterwards in the root.
I have ran the code with 1-16 cores and got the same output in all scenarios (the output was sometimes in different orders)
I made a test fasta file with 10 sequences, each with 1000 nucleotides and ran it on 40 cores vs 1 core. The outputs were the same, but listed in different orders. Timing below
40:
real 0m20.553s
user 14m16.015s
sys 0m43.630s
1:
real 1m33.892s
user 1m40.652s
sys 0m0.380s
40 seems to be quite a bit faster. The only thing is that it vomits a bunch of text
3Seq and GENECONV are running on the entire alignment as input files, so those should not be called from MPI for now. We would need to modify the source code to pass Python objects to C for this to work for these binaries, or port these methods into Python.
Replace multiprocessing
in bootscan with MPI for consistency
I am currently in the process but I've run into an issue:
I've removed all parts of multiprocessing and changed it to MPI4PY in the iss85 branch. The outputs are different.
master branch:
Method Start End Recombinant Parent1 Parent2 Pvalue
------------------------------------------------------------------------
Geneconv 1 204 Test2 Test3 - 2.00E-05
Geneconv 151 195 Test1 Test3 - 2.10E-03
Geneconv 203 507 Test1 Test2 - 8.29E-03
Geneconv 539 759 Test1 Test2 - 1.54E-01
Geneconv 151 193 Test4 - - 2.20E-02
Geneconv 56 170 Test1 - - 2.73E-02
Bootscan 100 160 Test2 Test3 Test4 2.29E-02
3Seq 202 787 Test3 Test1 Test2 5.98E-10
3Seq 181 787 Test2 Test3 Test4 5.29E-06
RDP 6 433 Test1 Test3 Test4 9.28E-17
RDP 6 412 Test2 Test3 Test4 1.04E-15
mpi4py
Method Start End Recombinant Parent1 Parent2 Pvalue
------------------------------------------------------------------------
Geneconv 1 204 Test2 Test3 - 2.00E-05
Geneconv 151 195 Test1 Test3 - 2.10E-03
Geneconv 203 507 Test1 Test2 - 8.29E-03
Geneconv 539 759 Test1 Test2 - 1.54E-01
Geneconv 151 193 Test4 - - 2.20E-02
Geneconv 56 170 Test1 - - 2.73E-02
Bootscan 180 860 Test1 Test2 Test3 1.04E-13
Bootscan 180 860 Test1 Test2 Test4 6.06E-13
Bootscan 100 160 Test2 Test3 Test4 1.35E-02
3Seq 202 787 Test3 Test1 Test2 5.98E-10
3Seq 181 787 Test2 Test3 Test4 5.29E-06
RDP 6 433 Test1 Test3 Test4 9.28E-17
RDP 6 412 Test2 Test3 Test4 1.04E-15
looking deeper into the issue, it seems that the dt_matrix_file
in both cases are the exact same. The only difference is the BootScan.raw_results
master
[('Test1 ', ('Test2', 'Test3'), 180, 860, 0.22208900176843166), ('Test2', ('Test1 ', 'Test3'), 100, 160, 0.5090064337856215), ('Test2', ('Test1 ', 'Test3'), 180, 860, 0.8755198548946761), ('Test3', ('Test1 ', 'Test2'), 100, 160, 0.058882580307752286), ('Test1 ', ('Test2', 'Test4'), 180, 860, 0.3969463145457672), ('Test2', ('Test1 ', 'Test4'), 180, 860, 0.6727008886719116), ('Test2', ('Test3', 'Test4'), 100, 160, 0.022905243454938964), ('Test3', ('Test2', 'Test4'), 100, 160, 0.4025393897717188)]
mpi4py
[('Test1 ', ('Test2', 'Test3'), 180, 860, 1.042518309852707e-13), ('Test2', ('Test1 ', 'Test3'), 100, 160, 0.5543662225935952), ('Test2', ('Test1 ', 'Test3'), 180, 860, 0.8063583710006824), ('Test3', ('Test1 ', 'Test2'), 100, 160, 0.058882580307752286), ('Test1 ', ('Test2', 'Test4'), 180, 860, 6.061565956710262e-13), ('Test2', ('Test1 ', 'Test4'), 180, 860, 0.587671445705755), ('Test2', ('Test3', 'Test4'), 100, 160, 0.013548600500603775), ('Test3', ('Test2', 'Test4'), 100, 160, 0.2693843515674098)]
I have found the issue, I believe it could've been incorrectly selecting a config file for bootscan. I have looked at the outputs and everything seems to line up without using multiprocessing
For testing in an MPI environment, I think we should have the root node communicate the pooled results to all other processes, so that the unit tests pass for everyone
added a new unittest to test openrdp overall for all methods in an mpi environment.
successfully passes it when compared the output in the master branch on the tests/long.fasta
file for all cores
I'd also like to add here that for threeseq, on my local device, one of the outputs differs with the same output from when I run it on paphlagon by 1 value in the last decimal:
local: "('Test3', ('Test1', 'Test2'), '202', '787', '5.982095e-10')"
pap: "('Test3', ('Test1', 'Test2'), '202', '787', '5.982096e-10')"
Let's use an AlmostEqual
test to that level of precision, that's acceptably close to the expected answer
Since many of the recombination detection methods operate on combinations of three sequences (triplets), it should be relatively straight-forward to process these in parallel across multiple threads. I personally prefer implementing this via message passing interface (MPI) but that would add a dependency and it would be more conventional to use Python's built-in
multiprocessing
module.