cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
300 stars 60 forks source link

Why last beta is always positive? #110

Open yihe-61wu opened 1 year ago

yihe-61wu commented 1 year ago

Thank you for the package. It helps me get piecewise linear fittings easily, which is great for my visualisation.

However, trying to further my analysis one more step, I extracted beta values. Weirdly, the last beta value, i.e., beta[-1], is always positive, while on my plot I can clearly see the last segment of my fitted curves is decreasing. I had a look at the Examples, and apparently under Section 'non-linear standard errors and p-values' the summary table shows the last beta to be positive, while the last segment in the above figure is decreasing.

Given the examples are official and nobody else seems to raise issues on a similar matter, I think it must be me who misunderstood the interpretation of beta values. Could you please help me clarify the issue?

Thank you in advance.

yihe-61wu commented 1 year ago

I did some simple test with 2-segment piecewise linear regression (unspecified breakpoints). I could output the breakpoints, which looked correct, and then I obtained yHat[0], (yHat[breakpointidx] - yHat[0]) / breakpointidx, (yHat[-1] - yHat[breakpointidx]) / (len(yHat) - breakpointidx), which should be expected to be very close to beta[0], beta[1], beta[2]. However, I could see the value of beta[-1] was different from (yHat[-1] - yHat[breakpointidx]) / (len(yHat) - breakpointidx) even by the sign.

yihe-61wu commented 1 year ago

The source code appears all solid to me. I used fitfast but tested fit too -- both called fit_with_breaks, which altered beta via lstsq, seemingly a wrapper for scipy.linalg.lstsq.

I guess the issue is probably not a bug in the code, but in my interpretation. Any comments would be appreciated.

cjekel commented 1 year ago

beta[-1] can be negative! You just haven't had a curve with a steep enough slope yet at the end. The final slope needs to cancel out the accumulation of all previous slopes. Slopes on the right are the accumulation of all previous beta values (except the y-intercept aka. beta[0]).

See

import numpy as np
import matplotlib.pyplot as plt
import pwlf

# generate sin wave data
x = np.linspace(0, 11, num=100)
y = np.sin(x * np.pi / 2)
# add noise to the data
y = np.random.normal(0, 0.05, 100) + y

my_pwlf_1 = pwlf.PiecewiseLinFit(x, y, degree=1)

# fit the data for four line segments
res1 = my_pwlf_1.fitfast(5, pop=50)

# predict for the determined points
xHat = np.linspace(min(x), max(x), num=10000)
yHat1 = my_pwlf_1.predict(xHat)

print('beta', my_pwlf_1.beta)

# plot the results
plt.figure()
plt.plot(x, y, 'o', label='Data')
plt.plot(xHat, yHat1, '--', label='degree=1')
plt.legend()
plt.show()

beta [ 1.06781215 -0.5918096 1.87834207 -2.49662666 2.37475738 -2.42097531]

image