STAT545-UBC / Discussion

Public discussion
38 stars 20 forks source link

Difference between lm(y~x) , lm(y,poly(x,1) #428

Open sinaneza opened 7 years ago

sinaneza commented 7 years ago

Deer professor and peers

I did a google search and I got that we are able to find quadratic model of two variables using lm(y~poly(x,2)).

I tested lm( y~ poly(x,1)) and I found that the result is so different from lm(y~x).

Can someone tell me about the reason. @jennybc

jennybc commented 7 years ago

I don't have time right now to dig deep on this but the short answer is the poly() fits a polynomial regression using a orthogonal polynomials as basis. What this means is the that the implied variables are NOT just x^2, x^3, etc. as you might expect. But rather, the terms of increasing degree are also constructed so as to be orthogonal to each other.

I've never even pondered what poly(x, 1) would do until today!

If you want the intuitive parametrization, request raw = TRUE, which will give same results as lm(x ~ x).

You'll notice all 3 methods give the same predicted values, because they are in fact fitting the same model.

fit_plain <- lm(Sepal.Length ~ Sepal.Width, data = iris)
fit_poly <- lm(Sepal.Length ~ poly(Sepal.Width, 1), data = iris)
fit_poly_raw <- lm(Sepal.Length ~ poly(Sepal.Width, 1, raw = TRUE), data = iris)
cbind(plain = coef(fit_plain),
      poly = coef(fit_poly),
      poly_raw = coef(fit_poly_raw))
#>                  plain      poly   poly_raw
#> (Intercept)  6.5262226  5.843333  6.5262226
#> Sepal.Width -0.2233611 -1.188376 -0.2233611
cbind(plain = head(predict(fit_plain)),
      poly = head(predict(fit_poly)),
      poly_raw = head(predict(fit_poly_raw)))
#>      plain     poly poly_raw
#> 1 5.744459 5.744459 5.744459
#> 2 5.856139 5.856139 5.856139
#> 3 5.811467 5.811467 5.811467
#> 4 5.833803 5.833803 5.833803
#> 5 5.722123 5.722123 5.722123
#> 6 5.655114 5.655114 5.655114