Implementation of auto_ardl, with possibility of not including a regressor?

Natsiopoulos / ARDL

ARDL, ECM and Bounds-Test for Cointegration

GNU General Public License v3.0

17 stars 9 forks source link

Implementation of auto_ardl, with possibility of not including a regressor? #19

Open elemn opened 1 year ago

elemn commented 1 year ago

Hello,

Say that I want to run the auto_ardl function with three independent variables X1, X2, X3. I want to determine the best model, where I search c(5,5,5,5) possible lags of Y, X1, X2, X3) respectively. Is there a way to use auto_ardl to do this and include the possibility that one of the X variables is not included at all? As it is currently, I think that all variables included must be at least included contemporaneously, which can hurt model selection.

Natsiopoulos commented 1 year ago

Indeed, the auto_ardl function will also include orders of 0 (contemporaneous effects). But it will never totally exclude a variable. This would be erroneous as you have specified the variables in the formula. What you can do is rerun the auto_ardl function without say variable X1, then without X2, etc. Then compare the best model from each one and the best model from the full search.

elemn commented 1 year ago

Hello, and thank you for your response! I think this would be a nice functionality as with more than 3 independent variables, it becomes quite taxing to do this for all possible combinations. I think this is the functionality of the leaps package for instance (except it does not take lags as it's not designed for time series).

A follow up question : what exactly is the search algorithm being implemented here? Is it a step wise selection algorithm such as forwards or backwards selection? The documentation points to a general description, but it's not super specific.

Natsiopoulos commented 1 year ago

Hello. Thank you for the suggestion. I will take a look at the package you mentioned and I will add this to the backlog.

The search algorithm is similar to the step-wise selection but it doesn't judge upon statistical significance. Also, it applies both forward and backward steps in each search. The description in the documentation may be a bit complicated but I think it describes it well enough. The difference between the two options of the algorithm is that at each step, after the change of an order and the evaluation of the model, one algorithm changes the order of the next variable while the other algorithm changes again the order of the same variable (up or down according to the evaluation of the model).