How to plot segments with fit_breaks information

esakhib commented 2 years ago

Hello. Thank you for your awesome library! I have a question about how to plot segments with fit_breaks.

I have the signal [x_in; y_in] (the 1st pic) and then use

my_pwlf = pwlf.PiecewiseLinFit(x_in, y_in)
my_pwlf.fitfast(breaks_num, pop=50)
y_out = my_pwlf.predict(x_in, beta=my_pwlf.beta, breaks=my_pwlf.fit_breaks)

then I compute segments lines like this:

x_line = []
y_line = []
for i in np.arange(my_pwlf.n_segments):
    x_line_idxs = np.where(np.logical_and(my_pwlf.fit_breaks[i] <= x_in, x_in<= my_pwlf.fit_breaks[i + 1]))[0]

    x_line.append(x_in[x_line_idxs])

    y_line.append(get_y_lines(my_pwlf, i + 1, x_in[x_line_idxs]))

where get_y_lines is:

def get_y_lines(pwlf_, segment_number, x):
    """https://jekel.me/2018/Continous-piecewise-linear-regression/.
    """

    for line in np.arange(segment_number):
        if line == 0:
            y_values = pwlf_.beta[0] + (pwlf_.beta[1]) * (x - pwlf_.fit_breaks[0])
        else:
            y_values += (pwlf_.beta[line + 1]) * (x - pwlf_.fit_breaks[line])

    return y_values

The question is why betwenn first two breaks (the 2nd pic) I don't have any signal's points? Is it correct or I do it the wrong way?

the first picture:

the second picture

cjekel commented 2 years ago

The question is why betwenn first two breaks (the 2nd pic) I don't have any signal's points? Is it correct or I do it the wrong way?

It's probably a bad local minima. You can try seeing if you increase the initial population if you get the same result.

my_pwlf.fitfast(breaks_num, pop=200)

Your intuition is correct though, you would expect a line to at minimum connect two data points.

Is it possible that your breaks_num is one more than what is needed to fit your data?

What are the red breakpoints in your plot? A known solution?

You don't need to specify beta and breakpoints after you perform a fit. It's optional, because you may have saved parameters from a previous fit.

my_pwlf = pwlf.PiecewiseLinFit(x_in, y_in)
my_pwlf.fitfast(breaks_num, pop=50)
y_out = my_pwlf.predict(x_in)

esakhib commented 2 years ago

@cjekel Is it possible that your breaks_num is one more than what is needed to fit your data? Yes, it is. I have a lot of examples like this signal with different shapes and I want to simplify this signal by segments (I set always breaks_num=10 because don't compute it).

What are the red breakpoints in your plot? A known solution? The red breakpoints are the first and the last values from x_lineand y_lineabove.

I mentioned that if I predict on np.linspace(min(x_input), max(x_input), 1000) I get better results (don't have case like on the 2nd pic above).

cjekel commented 2 years ago

If you are curious as to why there is a pwlf breakpoint here: but without a slope change, it's because of how you calculate the first and last line segment.

x_line = []
y_line = []
x_hat = np.linspace(min(x_in), max(x_in), n_samples)
for i in np.arange(my_pwlf.n_segments):
    x_line_idxs = np.where(np.logical_and(my_pwlf.fit_breaks[i] <= x_hat, x_hat<= my_pwlf.fit_breaks[i + 1]))[0]
    x_line.append(x_in[x_line_idxs])
    y_line.append(get_y_lines(my_pwlf, i + 1, x_in[x_line_idxs]))

will converge to the pwlf breakpoints as n_samples -> infinity. This is just because you are searching for line start and end points using the discretized data, and pwlf breakpoints occur as a contionus variable from x_in.min() to x_in.max()

In your application, is it important for breakpoints to only occur at data points? If so I have a branch somewhere that has an algorithm for this.

Can you show me just the raw signal of data points, and pwlf predict as a line with np.linspace(min(x_input), max(x_input), 1000)?

It looks like the pwlf fit is giving you (near) zero error with that potential single fictitious breakpoint. Additionally, it looks like you could move that breakpoint throughout the problem, and still have a fit that results in near zero error. This would imply that there are more than one (non-unique) solution for that specific number of line segments. I don't know what your application is, but I'm incline to say I don't think this is a big issue as long as you have more data points than beta parameters.

esakhib commented 2 years ago

@cjekel it's clear now why there's the breakpoint where no points. thank you very much!

yes, it is important. it would be nice if you can help me with this algorithm
yes, sure

cjekel / piecewise_linear_fit_py

How to plot segments with fit_breaks information #96