biometryhub / biometryassist

A package to aid in teaching experimental design and analysis through easy access and documentation of helper functions. Renaming of previous BiometryTraining package.
https://biometryhub.github.io/biometryassist
Other
8 stars 1 forks source link

Tukey test letters when backtransforming predictions #66

Open igorkf opened 1 year ago

igorkf commented 1 year ago

Hello, thanks for the package! I'm using asreml to fit mixed models and I found something curious.

This is my model:

library(asreml)
mod <- asreml(
  sqrt(Y) ~ Treatment * Year + Cultivar + Cultivar:Treatment + Cultivar:Field_Rep,
  data = data_rep,
  na.action = na.method(x = 'omit', y = 'include')
)

And this is my response distribution:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.000   2.100   2.903   4.200  11.000       1

I don't think I need an offset given my response has no negative values.

When plotting the Tukey test for one model that uses sqrt(Y) as the response variable, I saw non-overlapping error bars for my treatment levels, but all the letters were the same ("a"), i.e., just one group "a". This is the code:

library(biometryassist)
pred <- multiple_comparisons(mod, classify = 'Treatment', trans = 'sqrt')
autoplot(pred)

image

As you can see, some error bars are not overlapping but still, we have just one group "a".
Is that expected and am I missing something?

I was wondering if that is happening because the letters are being created before doing the back transformation: https://github.com/biometryhub/biometryassist/blob/c1bbb8702db3ad7f032f84df61b6bcf4c8384cdc/R/mct.R#L314C5-L314C5

Obs.: Sorry for this not being a reproducible example.

rogerssam commented 1 year ago

Hi, thanks for the question, and great to hear you're using the package.

I think the reason that you're observing letters in common with all groups but non-overlapping intervals is because the letters are calculated using a Tukey's distribution (see eg here: https://github.com/biometryhub/biometryassist/blob/c1bbb8702db3ad7f032f84df61b6bcf4c8384cdc/R/mct.R#L218) while the default confidence intervals are calculated with a t distribution. Due to the conservative nature of Tukey's distribution, this leads to narrower tails for the confidence interval than would be observed under the Tukey distribution. Hence you get letters in common with CIs that don't overlap.

This is a common point of confusion, so we should probably do something about it. My current thoughts are one (or both) of:

Open to suggestions though 🙂

igorkf commented 1 year ago

Hi. To be honest, I don't know which approach seems the best. Maybe a Tukey's interval to match the letters' calculation?