florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Beta or binomial distribution with discrete proportinal data? #350

Closed GeorgF92 closed 1 year ago

GeorgF92 commented 1 year ago

Hi Florian,

since I was recommended DHARMa, I'm an enthusiastic user and appreciate its versatility. Currently I'm working on a dataset of estimated cover scores of invasive plant species in many different 1x1m squares in 2021 (before plant removal) and in 2022. The data is bounded between 0% and 100%, but the cover scores of each plot are fixed ordinal categories (0-0.1%, 0.1-1%, 1-5%, 5-10%, 10-25%, 25-50%, 50-75%, 75-100%). In the final data, I have always assigned the midpoint of these bins to each data point, hence my data never truly reaches 100% (because i.e. if a datapoint has the 8th cover category of my focal invasive species, the value of cover is 87.5%). However, I have some zeros, because in some locations where I removed plants in 2021 there was nothing left in 2022 (thus, was noted as zero which means that the species was absent).

I'm testing wether the cover of invasive species can be reduced significantly by management intervention and wether the efficency of plant removal changes along an elevational gradient. In the end I have a nested data structure (with plots nested whithin study sites). Therefore I have to use some sort of mixed effects model which allows me to consider random effects.

The model specification is the following: cover_focal_species ~ poly(elevation, 2) + year + elevation:year + (1|plot_ID) + (1|site-ID)

From what I have read in the literature I should probably consider a beta regression because of the double bounded nature of my response variable, yet on the vignette of the DHARMa package you argue that discrete proportions (such as in my case), should be modelled with a binomial distribution.

Could you flesh out why and how? If you're interested, I could provide you the dataset.

Best regards, Georg Flückiger DHARMa_vignette_screenshot

florianhartig commented 1 year ago

Hello Georg,

in my comment, I referred to k/n data, where you have a discrete number of trials. When you observe cover, it's a case of real proportional data, and you should use the beta, at least in principle.

I say in principle, because my practical experience with cover data is that the categories (1%, 5% etc.) are in the field often more interpreted as "very little", "a bit" and so on, rather than real numeric values. For example, I often find that if I add cover for all species, I get values far > 100%, which shouldn't be possible. If the cover classes don't really match up to numerical values, an ordinal regression may in principle be more appropriate, although this will create all kinds of problems in the interpretation (so probably, you really want to avoid this).

I know it's against the tradition of vegetation ecology, but from the statistical viewpoint, it would be better to estimate and write down cover as a numeric value, e.g. 12.4%. Yes, then people always say: but I'm not sure it's exactly this value. But if you have uncertainty, you NEVER improve things by rounding, you are actually increasing the error, because you get a class error on top, which is in this case additionally asymmetric, which can create some weird patterns in the beta. Anyways, I would start with the beta, or some extensions (e.g. zero-inflated) in your case.

GeorgF92 commented 1 year ago

Hello Florian,

thanks a lot for your answer and your advice. In that case I think I misunderstood what you meant with k/n data. I will be fitting cumulative logit mixed models or beta regression with glmmTMB instead.

Concerning the cover classes I used: In my case I'm mainly interested in the cover of the focal invasive species, which never reached 100%. In addition to these species, I estimated the cover also of other functional groups such as forbs, graminoids, etc. Since I recorded the cover and not the top-cover, the total values can often be greater than 100% if summed up, which makes sense with that kind of estimation.

I'm aware of the issue with fixed classes, but I'm afraid that for this project its too late to change the scale used. Nevertheless, I will keep your advice for future projects!

florianhartig commented 1 year ago

OK, great! Note you can add zero-inflation in the beta as well! Cheers!