Pacific-salmon-assess / samSim

3 stars 0 forks source link

Beta distribution random draws #2

Closed CamFreshwater closed 5 years ago

CamFreshwater commented 5 years ago

I'm sure there's an easy answer to this, but I just can't seem to figure it out. I noticed yesterday that the rbeta() structure we're using to generate outcome uncertainty seems to shuffle the random number draws if you feed it two different vectors of equal length.

f <- function(mu, sigma = 0.1, location = NULL, shape = NULL, normal = FALSE) {
  if (is.null(location)) {
    location <- pmax(0.001,
                     mu^2 * (((1 - mu) / sigma^2) - (1 / mu)))
  }
  if (is.null(shape)) {
    shape <- pmax(0.001,
                  location * (1 / mu - 1))
  }
  if (normal == TRUE) {
    h <- rnorm(n = length(mu), mu, sigma)
  } else {
    h <- rbeta(n = length(mu), shape1 = location, shape2 = shape)
  }
  return(h)
}

x1 <- runif(10, 0, 0.99)
x2 <- runif(10, 0.5, 0.99)

# default behavior
set.seed(123)
f(x1)
runif(1)
set.seed(123)
f(x2)
runif(1)

# see it works with normal!
set.seed(123)
f(x1, normal = TRUE)
runif(1)
set.seed(123)
f(x2, normal = TRUE)
runif(1)

Anyone run into this before? Please tell me I'm not losing my mind...

sjpeacock commented 5 years ago

Hmm. I don't have a quick answer, but I did replicate your issue with my beta function too. If I ran it a few times with just one value in x1 and x2, I sometimes got the same runif(1) after but not always. Looking into this a bit further, it seems like rbeta uses different 'numbers' of random numbers depending on if the mean is close to 0 or 1, or not. Not sure what the underlying reason is. Check it out:

doMatch <- numeric(1000)

for(i in 1:1000){

    set.seed(123)
    f(x1[i])
    y1 <- sample.int(.Machine$integer.max, 1)

    set.seed(123)
    f(x2[i])
    y2 <- sample.int(.Machine$integer.max, 1)

    doMatch[i] <- ((y1-y2) == 0)
}

plot(x1, x2, col=doMatch+1)
text(0.5, 0.5, "MATCH", col=2, font=2)  
text(0.5, 0.99, "DON'T MATCH", col=1, font=2)   
carrieholt commented 5 years ago

Yes, this is part of the reason why, in the original code, I had input a vector random numbers to make sure that the sequence of random numbers were the same among sensitivity analyses after a call to rbeta (although I did remove rbeta in the end). Does this matter for your code and results? If the number of MC trials are sufficient, perhaps not?

CamFreshwater commented 5 years ago

Glad you could replicate it Steph and interesting that it seems to have to do with where mu is located.

Intuitively it seems as though you'd be right Carrie, that if the MC trials are sufficiently high it should come out in the wash. I'd have to think a bit about how you could easily determine whether that number would be stable across different scenarios, but worst case we could just ramp the number that we select based on single scenarios to be safe.

At this point it's mostly an annoyance when comparing diagnostic trials and troubleshooting during development, but it's not the end of the world I suppose.

seananderson commented 5 years ago

Pseudo-random numbers are weird. I've never had faith that I'd end up at the same seed point after running any random number generator. Even loading ggplot2 changes the seed!

You could either generate these or other random samples before, as Carrie suggests, or reset the seed after rbeta:

seed <- 3829
seed_multiplier <- 2

set.seed(seed)
f(x1)
set.seed(seed*seed_multiplier)
runif(1)
set.seed(seed)
f(x2)
set.seed(seed*seed_multiplier)
runif(1)

Yeah, as long as you have sufficient trials than the seed shouldn't matter and is mostly just there for reproducibility. But it can be nice to perfectly match the random draws where possible for comparison.

dt011025

CamFreshwater commented 5 years ago

Haha yeah I ended up just using a multiplier based on year and simulation number. Seems to work, but still a little irked that this is a thing considering the time it took me to debug... Oh well! All part of the character building I suppose.