cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
289 stars 59 forks source link

How to get the p_value of the whole model #77

Open SHEN-Cheng opened 3 years ago

SHEN-Cheng commented 3 years ago

yeah, through my_pwlf.p_values() i can get the calculate the p-value for each beta parameter. Like first the beta parameters (intercept + slopes) and then the breakpoints. but how to get the whole model p_value ?

cjekel commented 3 years ago

I just created an example that adds a test for model significance, and get's a p-value for the entire model. https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/test_for_model_significance.py

As defined in Section 2.4.1 of Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:38-44.

In the linear model case we setup a hypothesis test as: image

In the non-linear model case, we'll include the breakpoints as beta parameters. (since the breakpoints are unknown model parameters).

You reject H0 when p-values are less than some alpha.

Please leave this issue open, as the object should include this method!

SHEN-Cheng commented 3 years ago

I just created an example that adds a test for model significance, and get's a p-value for the entire model. https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/test_for_model_significance.py

As defined in Section 2.4.1 of Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:38-44.

In the linear model case we setup a hypothesis test as: image

In the non-linear model case, we'll include the breakpoints as beta parameters. (since the breakpoints are unknown model parameters).

You reject H0 when p-values are less than some alpha.

Please leave this issue open, as the object should include this method!

Great! You slove my problem.

kM-Stone commented 2 years ago

I just created an example that adds a test for model significance, and get's a p-value for the entire model. https://github.com/cjekel/piecewise_linear_fit_py/blob/master/examples/test_for_model_significance.py

As defined in Section 2.4.1 of Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology . Hoboken. New Jersey: John Wiley & Sons, Inc. 2009;20:38-44.

In the linear model case we setup a hypothesis test as: image

In the non-linear model case, we'll include the breakpoints as beta parameters. (since the breakpoints are unknown model parameters).

You reject H0 when p-values are less than some alpha.

Please leave this issue open, as the object should include this method!

Hi~ Thanks for the great work! I ran your code above, but I am confused for the result:

cjekel commented 2 years ago

Hi~ Thanks for the great work! I ran your code above, but I am confused for the result:

* it's your last comment in code:
  > in both these cases, the p_value is very large, so we can't reject H0

  Indeed, the results show large p-values for both case (0.85 and 0.95), but `my_pwlf.p_values()` shows `array([1.17134878e-06, 7.30540082e-51, 1.00331376e-21])`. So why each beta is significant  but whole model not?

Does the following change impact these results?

* line 77: `f0 = (ssr / k) / (sse / (n - k -1))`,the F-statistics form is consistent with your reference, i.e.
  ![image](https://user-images.githubusercontent.com/50538789/132291846-1e52a2e0-ed82-4f3b-8e61-8ee4daf31e02.png)
  but the ssr in code  is seems sum of squared of error (section 2.1 formula 10 ) , instead of sum of squared of regression? I swaped ssr and sse in code, and got quite small p-values.

Yup, nice catch! SSR in my code is actually SSE in that book, and vice versa. Sorry about this.

image

(look how this wiki article uses ESS and RSS, and the E and R in theses are swapped from the above book https://en.wikipedia.org/wiki/Explained_sum_of_squares )

cjekel commented 2 years ago

Hi~ Thanks for the great work! I ran your code above, but I am confused for the result:

* it's your last comment in code:
  > in both these cases, the p_value is very large, so we can't reject H0

  Indeed, the results show large p-values for both case (0.85 and 0.95), but `my_pwlf.p_values()` shows `array([1.17134878e-06, 7.30540082e-51, 1.00331376e-21])`. So why each beta is significant  but whole model not?

Does the following change impact these results?

The answer to this is yes. Fixed in 101711b87668c1f16e40c38c7b92951058c573f2 Many thanks to @kM-Stone for catching this mistake.

cjekel commented 2 years ago

To clarify, all uses of ssr in PiecewiseLinFit are okay and don't need changing. Including PiecewiseLinFit.r_squared.