DistanceDevelopment / dsims

New simulation R library
1 stars 2 forks source link

Cluster sizes generated using zero-truncated Poisson have a mean that is too large #53

Closed LHMarshall closed 2 years ago

LHMarshall commented 2 years ago

I noticed that the estimates of abundance of individuals for a simulation with clusters sizes coming from a zero-truncated Poisson distribution were more biased than the estimates of cluster size. I then confirmed positive bias in the estimates of expected cluster size.

On investigation it seems that the data are being generated with a mean that is higher than that which is requested. Looks like it might account for a ~6.5 to 7% bias.

mean(rtpois(10000000, mean = 7.5))
[1] 7.998969
LHMarshall commented 2 years ago

The function was only designed to take integer input hence the 7.5 value was causing issues... there needs to be a check added and a warning displayed! Or ideally look into expanding this function to take non integer values.

> mean(rtpois(1000000, mean = 7))
[1] 6.998865
> mean(rtpois(1000000, mean = 8))
[1] 7.999102
lenthomas commented 2 years ago

I don't know if this is useful or not, but the mean of a truncated poisson is given by lambda / (1 - exp(-lambda)). Given this you should be able to use optimize to find the lambda needed to achieve a given mean. You seem to have a lookup table in your code which isn't necessary.

Also you start by rounding the input mean, which will cause problems.

For the last part of the code, where you generate a value from the specified truncated distribution, your code seems right although I didn't understand the need to add +1e-10. I would think you could adapt the code from the truncdist package rtrunc, which in turn calls qtrunc - I think it's the same idea as you're using and doesn't seem to require that +1e-10 bit.

lenthomas commented 2 years ago

I had a stab at it, below (renamed to rztpois where the z is for zero (so people don't confuse with a more general truncated Poisson), and renamed argument N to n as is more standard for random distribution functions in R). Probably could be more robust, etc. See what you think.

rztpois <- function(n, mean = NA){

  if(mean <= 1) {
    return(rep(NA, n))
  } else {

    #Find lambda
    obj.func <- function(lambda, mean){
      Ex <- lambda / (1 - exp(-lambda))
      return((mean - Ex)^2)
    }
    lambda <- optimize(obj.func, lower = mean - 1, upper = mean, mean = mean)$minimum

    #Generate deviates
    pmin <- ppois(0, lambda)
    u <- runif(n, min = pmin, max = 1)
    ret <- qpois(u, lambda)
    return(ret)
  }  

}
lenthomas commented 2 years ago

Small change to make it more compatible with what rnorm does when you ask for a mean of NA:

rztpois <- function(n, mean = NA){

  if(is.na(mean) | (mean <= 1)) {
    warning("NAs produced")
    return(rep(NaN, n))
  } else {

    #Find lambda
    obj.func <- function(lambda, mean){
      Ex <- lambda / (1 - exp(-lambda))
      return((mean - Ex)^2)
    }
    lambda <- optimize(obj.func, lower = mean - 1, upper = mean, mean = mean)$minimum

    #Generate deviates
    pmin <- ppois(0, lambda)
    u <- runif(n, min = pmin, max = 1)
    ret <- qpois(u, lambda)
    return(ret)
  }  

}
LHMarshall commented 2 years ago

@lenthomas thanks!