ConnorDonegan / geostan

Bayesian spatial analysis
https://connordonegan.github.io/geostan
Other
56 stars 5 forks source link

Error from validate_positive_parameter() #13

Closed andtheWings closed 1 year ago

andtheWings commented 1 year ago

I have the following data:

> select(suid, suid_count, under_5_count)
Simple feature collection with 1315 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -88.26364 ymin: 41.4697 xmax: -87.52366 ymax: 42.15429
Geodetic CRS:  WGS 84
# A tibble: 1,315 × 3
   suid_count under_5_count                                                                     geometry
        <dbl>         <dbl>                                                                <POLYGON [°]>
 1          0           372 ((-87.6772 42.02294, -87.67628 42.02296, -87.67597 42.02297, -87.67265 42.0…
 2          0           516 ((-87.68465 42.01948, -87.68432 42.01948, -87.6839 42.01948, -87.68331 42.0…
 3          0           282 ((-87.67683 42.01941, -87.67674 42.01941, -87.67647 42.01941, -87.67642 42.…
 4          0           434 ((-87.67133 42.01937, -87.67121 42.01937, -87.67086 42.01937, -87.67074 42.…
 5          0            55 ((-87.66345 42.01283, -87.66321 42.01283, -87.66249 42.01285, -87.66226 42.…
 6          0           249 ((-87.66592 42.01279, -87.66543 42.01279, -87.66506 42.0128, -87.66423 42.0…
 7          0           165 ((-87.66573 42.00546, -87.66572 42.00546, -87.66567 42.00546, -87.66564 42.…
 8          0            70 ((-87.66564 42.00242, -87.66522 42.00242, -87.66495 42.00243, -87.66396 42.…
 9          1           272 ((-87.67059 42.00537, -87.67046 42.00538, -87.67008 42.00538, -87.66995 42.…
10          0           269 ((-87.68309 42.01232, -87.68268 42.0125, -87.68234 42.0125, -87.6822 42.012…
# … with 1,305 more rows
# ℹ Use `print(n = ...)` to see more rows

I tried to fit a poisson car model on it and got the following error:

> fit <- 
+     stan_car(
+         suid_count ~ offset(log(under_5_count)),
+         data = suid,
+         car_parts = cars,
+         family = poisson(),
+         cores = 4, # for multi-core processing
+         refresh = 0
+     )
Error in if (any(x <= 0)) stop(nm, " should be positive", call. = FALSE) : 
  missing value where TRUE/FALSE needed

And the traceback:

4.
validate_positive_parameter(scale)
3.
normal(location = alpha_mean, scale = alpha_scale)
2.
make_priors(user_priors = prior, y = y, x = x_full, link = family$link, 
offset = offset)
1.
stan_car(suid_count ~ offset(log(under_5_count)), data = suid, 
car_parts = cars, family = poisson(), cores = 4, refresh = 0)

Any help on troubleshooting appreciated!

ConnorDonegan commented 1 year ago

Something is being assigned a prior distribution with a negative value for the scale parameter (probably the intercept). That seems like a bug, because it' the default prior; but maybe its possible that there's something wrong with the data. Does the error happen before the prior distributions are printed to the console? Can you see that min(suid$suid_count, min(suid$under_5_count) are >=0?

andtheWings commented 1 year ago

Yes, I'm not seeing any prior distributions printing to the console before the error. Both variables have zero as their minimums. Here are their distros:

image

image

andtheWings commented 1 year ago

It seems like the error message is indicating it doesn't want there to be any values equal to zero?

if (any(x <= 0))

ConnorDonegan commented 1 year ago

Ah yeah, the population at risk has to be greater than zero. If there's no population at risk, then there can't be any deaths (or cases). So you can probably just drop those tracts from the analysis.

Mathematically, you'll be adding log(0)=-Inf to the linear predictor. Or you can think of it as having zero as the denominator in the rate: Expectation[y] = exp(log(pop) + eta) = pop * exp(eta) can be manipulated to show Expectation[y/pop] = exp(eta), which is also defined as y/pop=Inf.

Make sense? I think it would help if I added a more informative error message there

andtheWings commented 1 year ago

Ok yes, that makes sense, there was a single tract where the estimated population at risk was zero so I filtered out and the model was able to proceed with fitting/sampling. It looks like the chains did not converge, but that's a separate issue so I'll close out this issue. Thanks for the help!