ANTsX / ANTsRCore

Rcpp bindings for the C++ ANTs library used by the ANTsR package
9 stars 9 forks source link

Reproducible kmeans segmentation #51

Closed muschellij2 closed 6 years ago

muschellij2 commented 6 years ago

After the discussion we had about registration randomness (https://github.com/ANTsX/ANTsR/issues/226#issuecomment-404512924), which includes https://github.com/ANTsX/ANTsR/issues/210 and https://github.com/ANTsX/ANTs/issues/444, I wanted to look at other randomness and reproducibility weaknesses.

I understand that Atropos relies on a k-means segmentation and like all other k-means segs, it requires random initializations. I think we should have the ability to have this implementation of Atropos to have reproducible output. As you can see below, it is not reproducible, even when ANTS_RANDOM_SEED is set.

library(ANTsRCore)
set.seed(2)
Sys.setenv(ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS = 1,
           ANTS_RANDOM_SEED = 20180716)
Sys.getenv("ANTS_RANDOM_SEED")
[1] "20180716"
Sys.getenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS")
[1] "1"
orig <-antsImageRead( getANTsRData("r16") ,2)
seg<-kmeansSegmentation( orig, 3 )
seg2<-kmeansSegmentation( orig, 3 )
arr1 = as.array(seg$segmentation)
arr2 = as.array(seg2$segmentation)
tab = table(arr1, arr2)
tab
    arr2
arr1     0     1     2     3
   0 47118     0     0     0
   1     0  2383     2     0
   2     0     0  7352     3
   3     0     0     6  8672
identical(arr1, arr2)
[1] FALSE

Atropos documentation

In the documentation for Atropos, there is the option for using a random seed:

     -r, --use-random-seed 0/(1)
          Initialize internal random number generator with a random seed. Otherwise, 
          initialize with a constant seed number. 

But that is not clear in the atropos documentation or a standard option. I would like to add in random_seed to be a boolean that we pass into the r argument, but looking at this line https://github.com/ANTsX/ANTs/blob/6a428f6fabda42aaa2af595dbc97a9d74668a702/Examples/Atropos.cxx#L762 it seems as though it will use some arbitrary fixed random seed (that may not be right). Is it possible to set this seed using Sys.setenv or is -r 0 the best we can do at the moment?

muschellij2 commented 6 years ago

We should likely use the recommending RNGState variables: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-number-generation, but Rcpp could offer something a bit more elegant: http://gallery.rcpp.org/articles/random-number-generation/

stnava commented 6 years ago

really appreciate your attention to this.

happy to go with whatever you like as long as it follows some precedence similar to what's defined in ANTs.

brian

On Mon, Jul 16, 2018 at 7:01 PM, John Muschelli notifications@github.com wrote:

We should likely use the recommending RNGState variables: https://cran.r-project.org/doc/manuals/r-release/R-exts. html#Random-number-generation, but Rcpp could offer something a bit more elegant: http://gallery.rcpp.org/articles/random-number-generation/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANTsX/ANTsRCore/issues/51#issuecomment-405407100, or mute the thread https://github.com/notifications/unsubscribe-auth/AATyfoMqv0Apm_rtlL27bNCh02BriKKMks5uHRs2gaJpZM4VR5qR .

muschellij2 commented 6 years ago

There is different default behavior (aka be reproducible by default). I mean different from Atropos in ANTs default. If that's not OK, please set the default in https://github.com/ANTsX/ANTsRCore/blob/master/R/atropos.R#L57 to TRUE.