egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

Reproducible results #108

Closed lardenoije closed 1 year ago

lardenoije commented 2 years ago

I have been playing around with this package and so far it works great. However, I noticed that the results are not reproducible - when I run run_pathfindR twice with the same input I get different results, with sometimes quite a bit of variation (e.g. 60 enriched pathways on one run and 80 on the next).

If I want to use the results in a publication I prefer my code to be reproducible. Usually in R this is quite easy to do with set.seed(), however with this package most computations are performed externally in java. I tried to make the java code reproducible by adding a seed value, e.g. for SA I changed line 81 of SimulatedAnnealing.java to

Random rand = new Random(42);

and in ScoreCalculations.java I tried to make the shuffling reproducible:

// Create reproducible source of randomness for shuffling
        Random rand = new Random(42);

        for (int trial = 0; trial < numberOfTrials; trial++) {
//            long start=System.nanoTime();

            Collections.shuffle(nodeListForSampling, rand);

Unfortunately, this did not make the results reproducible. I then thought it might be due to the parallel processing, but using run_pathfindR with n_processes = 1 also did not help.

To be honest, I am not that familiar with java, so perhaps I missed something. Do you maybe have any suggestions to make the results reproducible?

egeulgen commented 2 years ago

Thank you for raising this issue. We've been planning on enabling setting a seed for some time, and will be implementing this in run_pathfindR() as an argument soon. I'll keep you updated and let you know once the change is made

egeulgen commented 1 year ago

We made the necessary changes (as of commit above) to ensure results are reproducible (by setting a seed per each iteration, which is now the default behavior of run_pathfindR)