califano-lab / ARACNe-AP

Network Reverse Engineering through AP inference of Mutual Information
Other
81 stars 24 forks source link

calculateMI() returns incorrect results when using multiple threads #12

Closed mcamagna closed 3 years ago

mcamagna commented 3 years ago

MIThread and MIThreadNA calculate the mutual information using the static computeMI() in the MI class, which is not synchronized in any way. This leads to a race condition wherein the threads interfere with each others calculation. Using this program with threads>1 therefore leads to an incorrect calculation of the mutual information in an irreproducible way.

I strongly suggest making computeMI() part of the MIThread/MIThreadNA classes, to avoid such a critical race condition!

grosenberger commented 3 years ago

Thank you for reporting this issue. Could you please help us to reproduce it?

When running the following commands on the test data contained in the repository, we obtain the same results for 1 or 4 threads.

java -Xmx5G -jar dist/aracne.jar -e test/matrix.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1 --calculateThreshold

java -Xmx5G -jar dist/aracne.jar -e test/matrix.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1
java -Xmx5G -jar dist/aracne.jar -e test/matrix.txt  -o outputFolder --tfs test/tfs.txt --pvalue 1E-8 --seed 1 --threads 4
mcamagna commented 3 years ago

I ran the program in loops with different varying thread numbers and could not induce a race condition. After a more careful inspection of the code, I realized that the static computeMI() functions are actually only accessing variables on the stack and therefore don't collide with each other. Even the setMI() method, which is synchronized incorrectly and accesses a HashMap in an unsafe way, was not causing a collision in 100+ runs. So I'd like to apologize for crying wolf, since despite that piece of code looking like a bad idea, it is in fact not causing problems. So I'd like to revise my statement: The program does not return incorrect results when using multiple threads.