easystats / see

:art: Visualisation toolbox for beautiful and publication-ready figures
https://easystats.github.io/see/
Other
891 stars 45 forks source link

plot.see_check_heteroscedasticity showing incorrect plot #106

Closed bwiernik closed 3 years ago

bwiernik commented 3 years ago

After the update to check_model(), plot.see_check_heteroscedasticity is still showing the residuals-fitted plot now labeled "linearity" instead of the second plot now labeled "homogeneity of variance".

strengejacke commented 3 years ago

Do you have a small example which plots you refer to?

bwiernik commented 3 years ago
library(performance)
m2 <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
plot(check_heteroskedasticity(m2))
check_model(m2, check = c("ncv", "homogeneity"))

plot(check_heteroskedasticity(m2)) should show the "homogeneity" plot, not the "ncv" one, which we now label as "Linearity"

strengejacke commented 3 years ago

check_heteroskedasticity

would you spell that word "heteroskedasticity" or "heteroscedasticity"? The function name is with "c".

strengejacke commented 3 years ago

But isn't the non-constant error variance the heteroscedasticity? See reference for the Breusch-Pagan test: Breusch, T. S., and Pagan, A. R. (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287–1294.

bwiernik commented 3 years ago

would you spell that word "heteroskedasticity" or "heteroscedasticity"? The function name is with "c".

I think this is an American/British English thing. "k" is more common in the US in my experience. We might consider adding an alias like dplyr and ggplot2 do.

But isn't the non-constant error variance the heteroscedasticity? See reference for the Breusch-Pagan test:

My point is that now that "NCV" plot is titled "Linearity" but the "homogeneity" plot is titled "Homogeneity of Variance". Ben Bolker suggested that each plot in the check_model() output should point to one and only one diagnostic, and that was adopted. The NCV plot and the homogeneity plots are similar and both can be used to inspect homoskedasticity. By square-rooting the residuals, the homogeneity plot can make detection of heteroskedasticity easier because the inspection can then be for a flat trend line instead of a trumpet shape. In any event, in check_model(), the "homogeneity" plot is now the one that has the correct title.

bwiernik commented 3 years ago

Regarding c/k : https://www.jstor.org/stable/1911250

strengejacke commented 3 years ago

I guess I'm still confused - wasn't the suggestion to rename the title from "non-constant error variance / heteroscedasticity" to "linearity"?

that each plot in the check_model() output should point to one and only one diagnostic

So one of the two plots should be dropped from check_model()? But having 5 panels by default is less beautiful ;-)

bwiernik commented 3 years ago

No, the suggestion is switch which plot is shown in plot.see_check_heteroscedasticity from "nvc" to "homogeneity".

So one of the two plots should be dropped from check_model()? But having 5 panels by default is less beautiful ;-)

No, just that the "ncv" plot can be used for many things (linearity, homoscedasticity, normality, etc.). It's easiest use is linearity, which is why that title was given to it. Homoscedasticity is more easily checked using the "homogeneity" plot. Both are useful, just following Ben's recommendation, the plot titles direct folks to look at one specific feature that should be clearest.

strengejacke commented 3 years ago

Ok, got it! Thanks for clarification!

DominiqueMakowski commented 3 years ago

Regarding c/k : https://www.jstor.org/stable/1911250

I love this type of papers haha!

So the French are the culprit again 🥇

On a side note though:

If heteros*edasticity were spelled with a c, it would thus have had to have entered the English language either in 1066 with the Norman invaders

pretty sure William the conqueror was obsessed by residual variance

Furthermore, it would have to be pronounced "heterossedasticity," which it is not. Heteroskedasticity is therefore the proper English spelling

HA! But in French I always heard it pronounced "hétéroS(c)étadisticité" (silent "K").

Note taken, from now on I'll be that guy that obnoxiously corrects everybody *"Excusez-moi, par contre on prononce hétéros-K-édasticité. Ca vient du Grec ancien."* 😁

IndrajeetPatil commented 3 years ago

giphy

strengejacke commented 3 years ago
library(performance)
m2 <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)

plot(check_heteroskedasticity(m2))
#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.042).
#> `geom_smooth()` using formula 'y ~ x'


check_model(m2, check = c("ncv", "homogeneity"))

Created on 2021-04-09 by the reprex package (v2.0.0)