RJT1990 / pyflux

Open source time series library for Python
BSD 3-Clause "New" or "Revised" License
2.11k stars 240 forks source link

ARIMAX With exogenous variable that depends on prediction #71

Closed CSNoyes closed 7 years ago

CSNoyes commented 7 years ago

Hi,

I don't completely understand the 'formula' variable in the ARIMAX function. I understand that patsy notation is being used but, for example, I don't really see what the how-to doc's formula ('drivers~1+seat_belt+oil_crisis') is doing.

So, drivers is the number of driver deaths. seat_belt and oil_crisis are 1 or 0 depending on if they are true at that point in time. Why is the formula '1 + exogenous' (where is '1' coming from)? If I just have an exogenous variable do I need to have the '1 +' term?

JQVeenstra commented 7 years ago

ARIMAX is an ARIMA model where the mean of the series at each point is a linear regression . Usually in a regression model, depending on the software you're using, you don't specify the constant term, usually you have to remove it. I actually prefer to specify the constant myself, most of the time, although I am not sure if Ross chose to make it explicit because he does or since it was easier to code. If you don't include the 1, you won't get an intercept term in your regression, which means that it will be constrained to go through the origin (in the regression space.) Generally that's not a good idea, but sometimes it's appropriate.

CSNoyes commented 7 years ago

Thank you for the help, that makes a lot of sense. On the issue of the intercept term, is 1 a magic constant for this example only or for regression models in general; or, here, the exogenous variables are boolean 1/0, in a model where they're bounded on [0,1] continuous would 1 still make sense as the intercept?

JQVeenstra commented 7 years ago

When regression models are fit in practice, a column of ones is added to the design matrix. Ones are used since we usually look at the betas as the effect of a change in one unit. So your first statement is closer to the truth, except that there's a specific reason we choose 1, not a symbolic one. Glad to be of assistance!

RJT1990 commented 7 years ago

By default patsy notation will generate a constant for you, that is, if you write:

drivers~seat_belt+oil_crisis

It will generate a constant for you. I have just written in the 1 in the string to make clear that a constant is being used - it doesn't change anything. However, you can remove the constant by writing:

drivers~-1+seat_belt+oil_crisis