easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
561 stars 55 forks source link

2 simple questions ref the chickwts example #413

Closed sfd99 closed 3 years ago

sfd99 commented 3 years ago

THANKS 10^6 for such clear, stepXstep examples in: https://easystats.github.io/bayestestR/articles/example1.html And...with a good sense of humor!.

2 simple Qs (if I may...).

Q1 Under the RED Density plot of x= feedsunflower:

"This represents the posterior distribution of the difference between meatmeal and sunflowers. It seems that the difference is positive (since the values are concentrated on the right side of 0). "

Where does this "difference between meatmeal and sunflowers" appear?.

As a newbie, I thought that the posteriors DF would have 3 cols: weight, sunflower and meatmeal

Yet... head(posteriors) (Intercept) feedsunflower 1 285.4417 47.85501 2 274.8670 79.83329

Where are this "difference" values calculated and defined?... And why was sunflower chosen as "feedsunflower" , the most significant of the 2 feeds in that red plot?.

Q2 Under the text:

hdi(posteriors$feedsunflower)

95% HDI: [1.80, 97.25]

But I get:

hdi(posteriors$feedsunflower) 95% HDI: [1.84, 96.14]>

Why the different HDI results?. Have they switched the chickens?... :-)

Thanks!. SFd99 San Francisco Ubuntu Linux 20.04, latest Rstudio and R. bayestestR 0.9.0 (d/l the development version).

DominiqueMakowski commented 3 years ago

Where does this "difference between meatmeal and sunflowers" appear?.

This is related to how a linear regression works. If you have a group variable with two levels, say A and B, and some continuous variable y, and you fit a linear model y ~ group (which is what we did in the example), the two parameters computed are the "intercept" (corresponding to the mean at the reference level) and the difference to the second level (the not-the-reference level). You can check what is the reference level by running levels(data$group): the first level that appear will be taken as the reference.

Why the different HDI results?

I think this is caused by the fact that Bayesian sampling is by essence probabilistic, so the results are likely to change a bit each time you compute them.

hope it helps!

sfd99 commented 3 years ago

Thanks Dominique. Yes, ++more clear now!.

sfd99 commented 3 years ago

Dominique - in the chickwts ex.,

# We keep only rows for which feed is meatmeal or sunflower

data <- filter(chickwts, feed %in% c("meatmeal", "sunflower"))

When I do: (as you suggested)

levels(data$feed) # where feed is the "group" variable, I get: [1] "casein" "horsebean" "linseed"
[4] "meatmeal" "soybean" "sunflower"

I was expecting: sunflower to be the "reference" var?... (the 1st var listed, as you suggested above?).

Instead, i see casein as the 1st var listed...