STAN moonshot - Githubissues

DanOvando commented 6 years ago

Give the top-to-bottom STAN model one last shot.

DanOvando commented 6 years ago

The broad issue: the "joint" model estimating all the abundances and passing that to the DiD would take a lifetime to converge, which seems odd since each individual species seems to converge in a reasonable amount of time. So, I would think that doing all of them would be more or less a sum of the individual times. The possibilities then: the data are so messy and your model so poorly specified that it just won't work or there's something about trying to do them together that throws a problem for STAN, e.g. it's trying to estimate a covariance matrix across a whole bunch of parameters a la VAST. So, you don't want to dedicate too much time to this, but let's give it more or less 1 day to see if you can make substantial progress. At the moment, two species takes about 6 minutes for just the delta-GLM part. If you can get that down substantially, you'll exert more energy.

[x] Try sigma-density by species. Maybe part of the problem is just that each species should have it's own sigma around density, which you can chuck in around line 268 of fit-ahnold-abundance.stan
[x] Try center-scaling by species. Maybe part of the problem is that the scales of the variables are so off for each species that it's having a hard time
[x] Do better checking of highly co-linear covariates. Maybe the problem is just bad model specification
[x] Something about the way you're passing the betas. At the moment, you are (what you thought cleverly) passing all the betas as one long vector, and then breaking that vector up. Maybe that creates a problem in the way STAN sets up NUTS? This one seems tougher, try the other ones first

DanOvando commented 6 years ago

Whoops, you were doing the clustering of the standard errors wrong. You were clustering each of the environmental betas by species, where instead you just need to say that each group of covariates comes from a central distribution. i.e. all the temperature terms come from a distribution ~N(0,sigma), all the intercepts ~N(0,sigma), etc, instead of clustering sigmas by species. You can do sigmas by species for homoskediasticity if you want, but that's another problem. But, the only need to go random effect on this is if you want to include the intercepts by species, otherwise you can just add them in as fixed effects

DanOvando / zissou

STAN moonshot #28