Closed QuocTran closed 2 years ago
Something interesting to note. If you happen to extrapolate beyond the largest data point, by even just one point, I think you'll end up with something like
Ad = np.array([[ 1., 2., 0.],
[ 1., 4., 0.],
[ 1., 6., 0.],
[ 1., 7., 0.],
[ 1., 8., 0.],
[ 1., 9., 0.],
[ 1., 11., 0.],
[ 1., 12., 0.],
[ 1., 16., 0.],
[ 1., 18., 0.],
[ 1., 19., 0.],
[ 1., 20., 0.],
[ 1., 21., 0.],
[ 1., 22., 0.],
[ 1., 23., 0.],
[ 1., 24., 1.]])
which should be invertible. Perhaps the regression matrix should be created using
A_list.append(np.where(x => self.fit_breaks[i+1],
x - self.fit_breaks[i+1],
0.0))
instead of
A_list.append(np.where(x > self.fit_breaks[i+1],
x - self.fit_breaks[i+1],
0.0))
? I believe that would avoid the last column being all zero. Not sure if that will end up breaking many things...
I'm pretty sure this has come up before. A PR is welcome!
How about adding inv_solver
keyword to prediction_variance(), with choices of 'inv', 'pinv', or 'pinv2'? I think I want to keep the default to 'inv' since that wouldn't change anything. Feel free to add a note in the docstring.
Could even add our own method to catch the LinAlgError
methods and try another method. The order of such could be something like inv > pinv > pinv2, which would be fine as the default since it won't break any existing behavior.
Thanks for creating the issue, with an easy to reproduce case!
When this comes up, are you performing least squares fits where you specify the breakpoints (.fit_with_breaks()
) or are you searching for optimal breakpoints (.fit()
)?
It seems unusual that the right most column of the regression matrix would be all zero. It would appear as if you have no data points between your last pair of breakpoints. Is this true in your case?
I pushed a branch with this change. Need someone to review if it does fix this problem. https://github.com/cjekel/piecewise_linear_fit_py/tree/pinv
python -m pip install https://github.com/cjekel/piecewise_linear_fit_py/archive/pinv.zip
I think a user has reached out to inform me that pinv
was definitely a fix for their problem here.
When I use this method https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1240 it provides the "LinAlgError: Singular matrix" too often. When debugging, I see that the Ad matrix calculated in here: https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1304 has the last column is all 0, such as: (Pdb) p Ad array([[ 1., 2., 0.], [ 1., 4., 0.], [ 1., 6., 0.], [ 1., 7., 0.], [ 1., 8., 0.], [ 1., 9., 0.], [ 1., 11., 0.], [ 1., 12., 0.], [ 1., 16., 0.], [ 1., 18., 0.], [ 1., 19., 0.], [ 1., 20., 0.], [ 1., 21., 0.], [ 1., 22., 0.], [ 1., 23., 0.]]) so: (Pdb) np.dot(Ad.T, Ad) array([[ 15., 198., 0.], [ 198., 3310., 0.], [ 0., 0., 0.]]) and then https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1323 will throw the error. Have you consider using pinv instead of inv here, which still provides a good pseudo inverse matrix to use later? (Pdb) linalg.pinv(np.dot(Ad.T, Ad)) array([[ 0.3168677 , -0.01895462, 0. ], [-0.01895462, 0.00143596, 0. ], [ 0. , 0. , 0. ]]) I can make a PR if you see it helpful