cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
289 stars 59 forks source link

ENH: an option to proceed data with errors #60

Closed vkhodygo closed 4 years ago

vkhodygo commented 4 years ago

Hi, Charles

It's been some time since I touched your package. Now I need it again and I realized that you have no feature to approximate data with y errors. My stats are quite rusty, but a simple scaling of the initial data using standard deviations should do the trick, I hope.

cjekel commented 4 years ago

Some reading for future time : https://online.stat.psu.edu/stat501/lesson/13/13.1 https://en.wikipedia.org/wiki/Weighted_least_squares https://stackoverflow.com/questions/27128688/how-to-use-least-squares-with-weight-matrix

cjekel commented 4 years ago

If w is the variance vector

We might actually need to do something like this

W = np.sqrt(np.diag(W))
Aw = np.dot(W,A)
Bw = np.dot(B,W)
X = np.linalg.lstsq(Aw, Bw)

in place of anywhere that currently calls scipy.linalg.lstsq

vkhodygo commented 4 years ago

@cjekel

If w is the variance vector

We might actually need to do something like this

W = np.sqrt(np.diag(W))
Aw = np.dot(W,A)
Bw = np.dot(B,W)
X = np.linalg.lstsq(Aw, Bw)

in place of anywhere that currently calls scipy.linalg.lstsq

Indeed, but your matrix A is already based on the initial vector x. It's just a matter of preference here, I think. That actually made me realize that a simple scaling of matrices (or input vectors) should work with a given number of breaks, however, what happens when you provide not a simple number but the positions of them?

cjekel commented 4 years ago

I think the issue with adding the weights to x_data before hand, is that the assembly of A needs to depend upon the original x_data. The reason is that I need to check whether the original data point is in which break zone. Modifying it before hand will effect the slopes break point locations.

I think we can add a keyword called weights which can either be a float, or a numpy array of length y, where weights[i] corresponds to the weight for x[i] and y[i] data point.

vkhodygo commented 4 years ago

I think the issue with adding the weights to x_data before hand, is that the assembly of A needs to depend upon the original x_data.

I didn't think about that.

I think we can add a keyword called weights which can either be a float, or a numpy array of length y, where weights[i] corresponds to the weight for x[i] and y[i] data point.

Does it change the result when you have identical weights everywhere? It looks like a simple scaling of the original minimization problem.

cjekel commented 4 years ago

https://github.com/cjekel/piecewise_linear_fit_py/pull/62