Closed frederikziebell closed 3 years ago
Hi frederikziebell.
If we were to make it via QR decomposition we would not achieve a high speed. You can see the code and regognize that it does not handle such cases as you mentioned. The purpose of Rfast is to make R fast. We let the user do the checks for his data. We give something that is bare bones and fast.
Michail
Is is also slower
library("microbenchmark")
n <- 400
set.seed(1)
X <- matrix(rnorm(n^2),ncol=n)
beta <- 1:n
Y <- X%*%beta
microbenchmark(
Rfast::lmfit(X,Y),
solve(qr(X),Y),
stats::lm.fit(X,Y)
)
Yes, I will agree with you. I also tested in an old laptop.
But, what kind of an example is this? Is this a realistic example? Who would perform a linear regression model with 400 variables if they had only 400 observations? Statistically it does not make too much sense. What if you try the same example with 50 variables? Is lmfit still slower? I think not.
But, if this example suits you, then fine by me.
Consider the example
which gives an error that the system is computationally singular. It's better to use a QR decomposition than directly inverting
crossprod(X)
, i.e. replacesolve(crossprod(X), crossprod(X, y))
withsolve(qr(X),y)
. If there is a vector of weightsw
or in general a weights matrixW
, one has to convert the linear model first to a homoscedastic one via an eigen-decomposition as done inMASS::lm.gls