emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
28 stars 10 forks source link

default number of iterations in rmst() might not be enough #86

Closed emmanuelparadis closed 8 months ago

emmanuelparadis commented 8 months ago

Taking the Hamming distances from the woodmouse data:

library(pegas)
data(woodmouse)
D <- dist.dna(woodmouse, model = "N")

I run rmst() on D twice and compare both returned networks.

for (i in 1:100) {
    nt1 <- rmst(D)
    nt2 <- rmst(D)
    stopifnot(all.equal(nt1, nt2))
}

The loop stops after, typically, between 1 and 3 iterations of the for loop. If the number of iterations of rmst() is fixed to 20:

B <- 20
for (i in 1:100) {
    nt1 <- rmst(D, B = B, quiet = TRUE)
    nt2 <- rmst(D, B = B, quiet = TRUE)
    stopifnot(all.equal(nt1, nt2))
}

The loop stops after, typically, between 30 and 80 repeats. If B <- 40, then this works fine.

Conclusion: the default stop criterion in rmst is too mild, at least for this dataset (see ?rmst):

R> ceiling(sqrt(nrow(woodmouse)))
[1] 4
emmanuelparadis commented 8 months ago

This should be solved. The new version (1.3-0.1) has a new method to find all possible MSTs by direct enumeration. This is the new default. The random method is still available with random = TRUE:

R> library(pegas)
R> data(woodmouse)
R> d <- dist.dna(woodmouse, model = "N")
R> newrnt <- rmst(d)
R> oldrnt <- rmst(d, random = TRUE)
Iteration: 9   Number of links: 22
R> all.equal(newrnt, oldrnt)
## sometimes TRUE but not most of the times

As expected from my previous post, increasing the number of iterations gives the correct RMST:

R> oldrnt100 <- rmst(d, random = TRUE, B = 100)
Iteration: 100
R> all.equal(newrnt, oldrnt100)
[1] TRUE

And there's a (very) substantial gain in performance:

R> system.time(rmst(d))
utilisateur     système      écoulé 
      0.001       0.000       0.001 
R> system.time(rmst(d, random = TRUE, B = 100, quiet = TRUE))
utilisateur     système      écoulé 
      0.052       0.000       0.052