cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
300 stars 60 forks source link

p-value #14

Closed adri-MASH closed 5 years ago

adri-MASH commented 5 years ago

Hey cjekel,

Thanks a lot for your work. Is there a way I can retrieve the p-values of the coefficients of piecewise regression? Thank you.

cjekel commented 5 years ago

I'm glad my library has been useful.

Just to be clear, in order to apply a p-value you must manually specify the break point locations. The assumption being made here is that your break point locations are the true -- or known-- break point locations.

The standard errors or p-values do not account for the error in your break point locations.

I have future plans to derive a method to approximate statistical properties when the break point locations are unknown, but this probably won't happen until there is a well documented paper for this library.


Section 2.4.2 of [1] (There are many editions of this book) defines how to calculate the p-value of individual parameters. Note that this is really a marginal test since each parameter is dependent upon the other parameters..

Essentially we have that t_j = \beta_j / se(\beta_j) where se denotes the standard error.

The p-value for t_j is the probability of having a value greater than |t_j| which follows Student's t-distribution of df=n-k-1. Where n is the number of data points, and k is the number of parameters.

And then to solve for the p-values we would do

from scipy import stats
# Get my model parameters
beta = my_pwlf.beta

# calculate the standard errors associated with each beta parameter
# not that these standard errors and p-values are only meaningful if
# you have specified the specific line segment end locations
# at least for now...
se = my_pwlf.standard_errors()

# calculate my t-value
t = beta / se

# degrees of freedom for t-distribution
n = len(x)
k = len(beta)

# calculate the p-values
pvalues = stats.t.sf(abs(t), df=n-k-1)

I've just added an example here which you can use as your reference.

Ref: [1] Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:38-44.


Let me plan to add this to the library, but for now you'll have to add the few lines of code yourself.

adri-MASH commented 5 years ago

Hi Cjekel, Thanks a lot for your help and the code! Good luck for the paper, let us know how it goes!

cjekel commented 5 years ago

Added p_values() function in 711672e5ec39dea9aabcd5058dcdf0c15e2a1e81