Preliminary results: nesting phenology

ebuhle commented 3 years ago

Hey @hoffmannsarahlouise, here are some results to look at, although so far there's not much to see. After our discussion in #1, I ended up going with fairly simple Bayesian hierarchical models fitted in rstanarm. The base model has a random intercept for each turtle and random annual anomalies around the mean. A first-order model then adds a linear time trend. There's no explicit autocorrelation in either the annual anomalies around the trend or the within-turtle observations, given how few real longitudinal time series we have.

On top of that basic structure I added the weather data. I used daily average humidity, wind speed, pressure and dew point and daily precip and SST. I then took annual averages over the full window of encounter dates in the data, applied them to all encounters in the respective years, and centered and standardized them.

First up is encounter DOY. The models shown below are fitted to all encounters (if you include only the first encounter in a given year the results are essentially the same, just more uncertain). I don't know enough meteorology or turtle ecology to have a strong hypothesis about how the terrestrial measurements might be associated with nesting phenology, either directly or as proxies, but I can imagine reasons for an SST "effect". Alas, none of that appears to be the case.

Model Info:
 function:     stan_lmer
 family:       gaussian [identity]
 formula:      doy_encounter ~ year0 + humid_std + ws_std + p_std + dp_std + ppt_std + sst_std + (1 | name) + (1 | fyear)
 algorithm:    sampling
 sample:       3000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1813
 groups:       name (567), fyear (14)

Estimates:
                                       mean   sd    2.5%   50%   97.5%
(Intercept)                          131.4    4.0 123.1  131.6 139.7  
year0                                 -0.3    0.6  -1.5   -0.3   1.0  
humid_std                              2.4    3.6  -4.9    2.4  10.2  
ws_std                                 0.6    2.9  -4.7    0.5   6.9  
p_std                                  0.5    2.3  -3.9    0.5   5.3  
dp_std                                -0.8    3.3  -6.7   -1.0   6.6  
ppt_std                                0.0    2.3  -5.0    0.1   4.5  
sst_std                                1.2    1.8  -2.5    1.2   4.6

Sad trombone. Not one of the regression coefficients in the full model is even close to being meaningfully different from zero. We could do formal model selection to compare the full model to the two simpler ones, but there wouldn't be much point. The full model isn't great in an absolute sense, either. This posterior predictive density plot compares the marginal distribution of the observed DOY (the black density) to the marginal densities of 100 hypothetical replicate datasets generated from the posterior predictive distribution (PPD; analogous to the prediction interval in classical regression). If the model is good, the marginal PPD (at least!) should be close to the data.

As you can see, the average nesting phenology is quite different from the mixture-normal PPD (actually pretty close to normal, period, since all the effects are so small compared to the noise, i.e. the model doesn't think phenology has changed much over time). Nesting season starts off more gradually and ends more abruptly than a normal curve. Here's a clearer illustration:

The annual phenologies are pretty wonky, but there's no real indication that they've shifted over time. It might be possible to find a better probability model for these data than the Gaussian -- skew-normal or something -- but I see no reason to think that would give us different answers to the questions of interest.

What is interesting is the amount of variation among turtles and among encounters (most of them with turtles seen only once or twice). Here's how the point estimates of the variance components break down.

Error terms:
 Groups   Name        Std.Dev.
 name     (Intercept)  7.4    
 fyear    (Intercept)  4.8    
 Residual             21.5

Residual error dominates, but there is a fair bit of among-turtle variation, meaning some females tend to be later or earlier than others. You'd never know it from looking at the female-specific trajectories of annual average encounter date, though:

I've got some more data-viz ideas up my sleeve that should give a better sense of individual-level variation. Regardless, variation seems to be the main story -- whatever it is that explains nesting phenology, the action is mostly at the individual and within-year level rather than interannual trends in response to environmental factors or otherwise.

Would love to hear your thoughts as an actual turtle ecologist! Next up I'll tackle body size, then the non-Gaussian responses.

ebuhle commented 3 years ago

I've got some more data-viz ideas up my sleeve that should give a better sense of individual-level variation.

This shows the posterior distribution of the average encounter DOY in an average year (i.e., setting all other fixed and random effects to zero) for just those females that have been seen in at least 4 years (so their random effects are relatively well identified). The lines are the posterior means. They range from late April to late May, so not a ton of variation, but individual females do seem to have distinct seasonal modes. BUNNY would be very unlikely to run into MZAZI at the beach, for example.

(You know the real reason I like this plot, though. :turtle:)

ebuhle commented 3 years ago

As I've thought more about this, it seems like what's missing from the phenology models shown above is a way to capture the intra-annual daily variation around the average timing distribution. The linear models for encounter DOY just try to predict the central location of the annual curve (assumed to be Gaussian) based on annual anomalies around the average climatology. With only 14 years that's a lot to ask of the data, and most of the variation ends up in the unexplained residual noise term.

You can't simply add daily (or day-varying, e.g. running average) weather measurements to the DOY models, because that would just tell you that conditions obviously vary over the nesting season. Instead, it seems to me that what we really want to predict is the number of females encountered on a given date. Such a model would include some sort of nonlinear function of DOY to describe the average phenology. This could be a skew-normal kernel or maybe a spline, although a GAMM would get ugly fast given all the covariates and random terms. The location (and maybe scale and/or shape) of this curve would vary interannually, as in our current DOY models. But more important, daily weather conditions could adjust the curve up and down, turning the smooth function into a lumpy one as (hopefully) seen in the data. The observed number of encounters would then be modeled as Poisson or negative binomial with expectation given by the curve plus covariate effects.

I'm embarrassed I didn't think of this sooner, since I've worked on exactly this sort of model for salmon spawning and outmigration timing. It's definitely a bit more complex than the simple DOY models, and it couldn't be fitted in rstanarm; brms could handle it, though. I really like this idea and plan to pursue it, though probably not until I get the first pass through all the response variables done and written up as a vignette. Would love to hear your thoughts and suggestions!

ebuhle / LeatherbackNesting

Preliminary results: nesting phenology #3