Analyzing data at the plot-level without re-summarizing

dobsonk2 commented 2 years ago

Hi Phoebe,

I have a question about running mixed models and being able to specify what level you're interested in comparing. For example, for the biomass data, each row represents a species within a plot. However, we're interested in comparing differences between plots (so plot level) and not necessarily between species. I remember we chatted before about structuring a mixed model in a format that gets at that plot-level comparison without having to re-summarize the data into plot-level totals, but I can't seem to figure it out. I was also trying to do this for the greenup data (also unsuccessfully) but the biomass data is a lot easier to work with, so I'll use that as an example!

Here's the analysis script for biomass: https://github.com/SpaCE-Lab-MSU/warmXtrophic/blob/master/R/L2/biomass_analyses_L2.R On line 167, I create several models with different structures, some including species and some not. I also include some non-mixed effects models on line 221. However, I also re-summarized the data into a new dataframe (line 49) to look at plot biomass sums without species and ran a model with that (line 179), and it shows a potential difference between warmed and ambient plots, but none of the mixed models I ran reflected this.

I think I'm just structuring the mixed models incorrectly & could get at this plot-level comparison if they were structured differently - this will also help with the greenup analyses where all plot comparisons were done on a re-summarized dataframe. Thanks for the help!

plzmsu commented 2 years ago

When you summarize to plot biomass sums, that's not the same as mean plot-level biomass (which is what a model will estimate if you have each row = unique biomass measure). So I think this is more about "sum is not the same as mean". Try the mean and it should be pretty similar for model results, but that's maybe not what you want as a variable.

I think it's probably safest to use a data frame that reflects the resolution of the analysis. Some R packages will enable you to code a model in such a way that it understands you're not treating every row as a unique observation, but others may not. So I would proceed with modeling the correct resolution data frame (e.g., plot-level would be each row is a plot). If you want to quantify effects at different levels within the same model (species within plot and plot-level) then you need a data frame at the finest resolution (each row = unique species-plot combo).

In your model formulations: (1|plot) means that you're acknowledging that each plot can have its own variance because we repeated measures (we have random effects) at the plot level. Plots are sampled repeatedly over time, and you don't want to understand the plot influence, you just want to account for its variation.

(1|plot/species) means that species is nested inside plots, but you don't want to really understand the influence of plot or species on the response variable, just account for their repeated measures while acknowledging their nesting. However, I don't think our design is nested, so this model doesn't make much sense (see linked discussion below).

(1|plot) + (1|species) means that you have one random intercept for plot (its variance is estimated separately), and another for species (its variance is estimated separately). This reflects the crossed design of our experiment, even though we have incomplete crossing of species in plots.

Here is a good discussion on stack exchange about crossed vs. nested in mixed effects models: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified - we have a crossed design that is incomplete because not all species occur in all plots.

Does that help? Sometimes plotting these models helps see how they are structured to visualize which variables have different intercepts, etc.

dobsonk2 commented 2 years ago

Yes this is helpful, thanks Phoebe!

SpaCE-Lab-MSU / warmXtrophic

Analyzing data at the plot-level without re-summarizing #26