drop usage of sklearn ?

12rambau commented 3 years ago

By reading your code i realized that you were using the sklearn lib. But according to what I found in the develop branch, you are not using the intersect parameter of the LinearRegression method.

https://github.com/diku-dk/bfast/blob/476eb8b09e94906f80057d832104e1ed6dc2293e/bfast/monitor/python/base.py#L230

It means that using LInearRegression all-together is useless as the optimisation problem have a closed solution given by

theta = (H^T*H)^-1*H^T*Y

I am proposing a change to the lib using only numpy to compute the coefficients. According to the tests I've performed on my local computer, I get the same results twice as fast (44s with sklearn, 23 without).

It will also reduce the number of dependencies which is alway important.

mortvest commented 3 years ago

The Python backend was created for validation of the GPU backend(s), hence performance was not a concern. However, I would like to get rid of sklearn dependency. I am thinking about substituting it with statsmodels since it is going to be used other places as well in the future, and I have also heard that it is a thinner wrapper around the numpy code.

12rambau commented 3 years ago

As mentioned in the PR I think I can still improve the numpy backend so I'll continue working on it

diku-dk / bfast

drop usage of sklearn ? #28