cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
289 stars 59 forks source link

Coefficient access #84

Open gretzteam opened 3 years ago

gretzteam commented 3 years ago

The library is awesome. However not being able to easily access the coefficients is a huge problem for some applications. I saw this https://github.com/cjekel/piecewise_linear_fit_py/issues/44 got resolved but requires installing other packages, and is not an 'out of the box' solution.

cjekel commented 3 years ago

The model parameters that this libraries finds are .fit_breaks and .beta, which are the breakpoints and model coefficients. This representation isn't an individual equation for each line, but rather a single equation for the entire relationship. I don't think I've written this equation out for arbitrary polynomials in latex (but I probably should),

If you are working with linear segments, degree=1, then you can use the .calc_slopes() method to populate a slope intercept attributes for of each line.

If you are working with high degrees, then things get complicated. One thing that doesn't help is my undocumented assembly process. I construct the matrix from left to right starting from the 0th degree, then the 1st for all line segments, and so forth. What makes this complicated, to get polynomial coefficients evaluated from the breakpoints and beta parameters is that you need to fully evaluate every line to left of the line you are considering.

I agree that I don't want to add sympy as a requirement as I did here https://github.com/cjekel/piecewise_linear_fit_py/issues/44 which I was using to crudely sum all of the correct coefficients from the previous (ie making sure the linear segments remain with the linear segments, and so forth).

A better approach would be a single method, that 1) grabbed all of the coefficients for each line 2) summed up the coefficients to evaluate individual lines.

for instance my previous sympy work, could be modified by something like this (I have not ran this code, but this is what I was thinking)

def get_symbolic_eqn(pwlf_, segment_number):
    if pwlf_.degree < 1:
        raise ValueError('Degree must be at least 1')
    if segment_number < 1 or segment_number > pwlf_.n_segments:
        raise ValueError('segment_number not possible')
    # assemble degree = 1 first
    for line in range(segment_number):
        my_1st_term = []
        if line == 0:
            my_zero_term = pwlf_.beta[0] + (pwlf_.beta[1])*(x-pwlf_.fit_breaks[0])
        else:
            my_1st_term.append(pwlf_.beta[line+1])*(x-pwlf_.fit_breaks[line]))
    # assemble all other degrees
    if pwlf_.degree > 1:
        # arbitrary degree not supported
        my_2nd_term = []
        for k in range(2, pwlf_.degree + 1):
            for line in range(segment_number):
                beta_index = pwlf_.n_segments*(k-1) + line + 1 
                my_2nd_term.append((pwlf_.beta[beta_index])*(x-pwlf_.fit_breaks[line])**k)
    return sum(my_zero_term), sum(my_1st_term), sum(my_2nd_term)

I think this just illustrates how the terms are summed. A better approach would populated the indices of interest, then use numpy slicing to grab them from beta and sum accordingly.