facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.36k stars 4.52k forks source link

Coefficients of additional regressors and contributions to the result #2045

Open ASevastkms opened 3 years ago

ASevastkms commented 3 years ago

Good day!

I have built a model with Profet and the prediction results are very good. But in my model there are 4 additional regressors (for simplicity - A, B, C, D) and I need to get their coefficients and how much the forecast result depends on each of them (contributions of the regressors). I am using regressor_coefficients () function and I get:

regressor | regressor_mode | center | coef_lower | coef | coef_upper A | additive | 1581.334616 | 0.000153 | 0.000153 | 0.000153 B | additive | 655.913061 | -0.003166 | -0.003166 | -0.003166 C | additive | 0.000000 | 0.000000 | 0.000000 | 0.000000 D | additive | 0.000000 | 0.000000 | 0.000000 | 0.000000

And when using m.train_component_cols.T.dot (np.array (m.params ['beta']) [0]) I get the following:

component A 0.000248 B -0.002786 C 0.000000 D 0.000000

Do I understand correctly that in the second case the coefficients are standardized, but in the first case they are no longer?

When I do forecast = m.predict (future) I get:

A | A_lower | A_upper | B | B_lower | B_upper -0,24199 | -0,24199 | -0,24199 | 2,076534 | 2,076534 | 2,076534

What is it? Are these the contributions of the regressors to the result? Then why the regressor A with a positive coefficient gives a negative contribution, I do not understand, tell me, please. Sorry for bad english)

tcuongd commented 2 years ago

1) Yes that's correct, regressor_coefficients() un-standardizes the coefficients.

2) Just checking that what you've pasted isn't the value of the extra regressors for the prediction period? (i.e. you'd have columns A, B, C, D with their actual values as well as inputs to the prediction).

If not, then yes they should be the contributions of those regressors to yhat. The contribution can be negative if the value of A is below the "center" -- the contribution to the prediction is coefficient_standardized * (A_value - center) / std

elif-tr commented 2 years ago

Hi @tcuongd I have been trying to recreate the calculated values of each variables contribution from the above formula you stated but unfortunately, I am unable to do so. I am unclear on whether I am missing any steps along the process but I would very much appreciate your help on this before by deadline to explain it to the stakeholders.

I would like to give examples just for the simplicity so we can follow the steps.

The variable I have lets say is X - (I have quite a lot of them) and the output of the regressor_coefficients() as follows:

regressor | regressor_mode | center | coef_lower | coef | coef_upper X | additive | 409262.904434 | 0.000059 | 0.000059 | 0.000059

The above coefficient is the un-standardized value of that variables coefficient which we do not use to calculate its contribution?

Then from the mentioned above method: m.train_component_cols.T.dot (np.array (m.params ['beta']) [0]) I get the following gives me X = 0.003110 which is the unstandardized coefficient for the X variable, is that correct?

Then I look at the df of prediction outputs. On the first day (t) I see the value of X as -24.332580 and the next day (t+1) 124.304026.

In the actual data frame, the raw value I have for X in first day (t) is 0 and the second day (t+1) is 2500000.0.

And from the extra regressors dictionary out of the model, I extract the following information:

('X', {'prior_scale': 5.0, 'standardize': 'auto', 'mu': 409262.9044342508, 'std': 840476.036516692, 'mode': 'additive'})

So as for the contribution of the X on the first day (t) in the model, it should be:

0.003110 * ((0 - 409262.9044342508)/840476.036516692)

and for the next day (t+1): 0.003110 * ((2500000.0 - 409262.9044342508)/840476.036516692)

but these calculations are not giving me anywhere near numbers of -24.332580 or 124.304026. Would you be able to, please, point out the steps that I am missing in calculation the contribution of each regressor to the yhat on a given day?

I am so hoping that you would see my question sometime soon..

DariaGoncharenko commented 2 years ago

Hi @elif-tr , did you solve this case? I am having the same issue with this part and trying to find the solution.