cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
289 stars 59 forks source link

prediction_variance throw "LinAlgError: Singular matrix" too often #67

Closed QuocTran closed 2 years ago

QuocTran commented 4 years ago

When I use this method https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1240 it provides the "LinAlgError: Singular matrix" too often. When debugging, I see that the Ad matrix calculated in here: https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1304 has the last column is all 0, such as: (Pdb) p Ad array([[ 1., 2., 0.], [ 1., 4., 0.], [ 1., 6., 0.], [ 1., 7., 0.], [ 1., 8., 0.], [ 1., 9., 0.], [ 1., 11., 0.], [ 1., 12., 0.], [ 1., 16., 0.], [ 1., 18., 0.], [ 1., 19., 0.], [ 1., 20., 0.], [ 1., 21., 0.], [ 1., 22., 0.], [ 1., 23., 0.]]) so: (Pdb) np.dot(Ad.T, Ad) array([[ 15., 198., 0.], [ 198., 3310., 0.], [ 0., 0., 0.]]) and then https://github.com/cjekel/piecewise_linear_fit_py/blob/master/pwlf/pwlf.py#L1323 will throw the error. Have you consider using pinv instead of inv here, which still provides a good pseudo inverse matrix to use later? (Pdb) linalg.pinv(np.dot(Ad.T, Ad)) array([[ 0.3168677 , -0.01895462, 0. ], [-0.01895462, 0.00143596, 0. ], [ 0. , 0. , 0. ]]) I can make a PR if you see it helpful

cjekel commented 4 years ago

Something interesting to note. If you happen to extrapolate beyond the largest data point, by even just one point, I think you'll end up with something like

Ad = np.array([[ 1., 2., 0.],
[ 1., 4., 0.],
[ 1., 6., 0.],
[ 1., 7., 0.],
[ 1., 8., 0.],
[ 1., 9., 0.],
[ 1., 11., 0.],
[ 1., 12., 0.],
[ 1., 16., 0.],
[ 1., 18., 0.],
[ 1., 19., 0.],
[ 1., 20., 0.],
[ 1., 21., 0.],
[ 1., 22., 0.],
[ 1., 23., 0.],
[ 1., 24., 1.]])

which should be invertible. Perhaps the regression matrix should be created using

                A_list.append(np.where(x => self.fit_breaks[i+1],
                                       x - self.fit_breaks[i+1],
                                       0.0))

instead of

                A_list.append(np.where(x > self.fit_breaks[i+1],
                                       x - self.fit_breaks[i+1],
                                       0.0))

? I believe that would avoid the last column being all zero. Not sure if that will end up breaking many things...


I'm pretty sure this has come up before. A PR is welcome!

How about adding inv_solver keyword to prediction_variance(), with choices of 'inv', 'pinv', or 'pinv2'? I think I want to keep the default to 'inv' since that wouldn't change anything. Feel free to add a note in the docstring.

Could even add our own method to catch the LinAlgError methods and try another method. The order of such could be something like inv > pinv > pinv2, which would be fine as the default since it won't break any existing behavior.

Thanks for creating the issue, with an easy to reproduce case!

cjekel commented 4 years ago

When this comes up, are you performing least squares fits where you specify the breakpoints (.fit_with_breaks()) or are you searching for optimal breakpoints (.fit())?

It seems unusual that the right most column of the regression matrix would be all zero. It would appear as if you have no data points between your last pair of breakpoints. Is this true in your case?

cjekel commented 3 years ago

I pushed a branch with this change. Need someone to review if it does fix this problem. https://github.com/cjekel/piecewise_linear_fit_py/tree/pinv

python -m pip install https://github.com/cjekel/piecewise_linear_fit_py/archive/pinv.zip
cjekel commented 2 years ago

I think a user has reached out to inform me that pinv was definitely a fix for their problem here.