ethanweed / pythonbook

Python adaptation of Danielle Navarro's Learning Statistics with R (http://learningstatisticswithr.com/). Work in progress!
100 stars 34 forks source link

Standardized residuals for scale-location plot in 05.04-regression #36

Closed ivanistheone closed 5 months ago

ivanistheone commented 5 months ago

In the subsection "Checking the homogeneity of variance" in the chapter on linear regression, the Scale-Location Plot shows the sqrt-abs-value of the unstandardized residuals, when it should show residuals divided by $\widehat{\sigma} = \sqrt{ \frac{SS_{resid}}{n-p-1}}$, a.k.a. np.sqrt(lm.scale) for a linear model fit object lm obtained from statsmodels.

To get the standardized residuals, the code could be changed from:

df_slplot = pd.DataFrame(
    {'fitted': mod2['pred'],
     'sqrt_abs_stand_res': np.sqrt(np.abs(mod2['residuals']))
    })

to

SS_resid = np.sum(mod2["residuals"]**2)
p = 2  # bcs 2 predictors = 'dan_sleep' and 'baby_sleep' 
sigmahat = np.sqrt( SS_resid/(n-p-1) )
stand_res = mod2['residuals'] / sigmahat
df_slplot = pd.DataFrame(
    {'fitted': mod2['pred'],
     'sqrt_abs_stand_res': np.sqrt(np.abs(stand_res))
    })

Probably not a big deal, but it would be nice to have the graph match plots readers might see in R plots, and also 1 = $\widehat{\sigma}$ is a useful reference.

ethanweed commented 5 months ago

Good catch, thanks! Am I right that n in your line sigmahat = np.sqrt( SS_resid/(n-p-1) ) can be defined as n = len(mod2['residuals'])?

ivanistheone commented 5 months ago

Am I right that n in your line sigmahat = np.sqrt( SS_resid/(n-p-1) ) can be defined as n = len(mod2['residuals'])?

yes exactly; n is the sample size

Message ID: @.***>

ethanweed commented 5 months ago

Perfect, thanks!