JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
994 stars 113 forks source link

A single model multivariate forecaster #626

Open KishManani opened 5 months ago

KishManani commented 5 months ago

Currently in ForecastMultivariate a separate model is trained for each level/variable. Would it be possible to train a single model across all variables where an indicator variable (e.g., one hot encoding, ordinal encoding, etc.) is used to tell the model which variable it is forecasting (similar to what is currently done in ForecasterMultiSeries)? I've not had any experience with such an approach and was wondering whether you have any thoughts on this approach.

Thanks again, Kishan

JavierEscobarOrtiz commented 5 months ago

Hello Kishan,

As it is now, the ForecasterAutoregMultiVariate is only trained in one level (series 1 or series 2) and creates one model per step (direct approach). In the figure, let's say that if you specify level = 'Series 1' when creating the Forecaster, it will only use the training matrix at the top.

forecaster_multivariate_train_matrix_diagram

Training a single model across all variables as in ForecasterMultiseries is not possible because the training matrix grows horizontally (not vertically as in the MultiSeries approach) and you will have multiple columns to use as response variables.

Best, Javi

JoaquinAmatRodrigo commented 5 months ago

Hi, I think the type of forecaster @KishManani mentions can be created with two approaches:

  1. Using a multi-output regressor (multi-target) from sklearn.multioutput

  2. Using a regressor that natively allows multioutput (multi-target)

The second approach is the one where neural network architectures can help. In the next releases (0.12.0) we will add a new forecaster ForecasterRNN that will allow using Keras models within the skforecast framework, including the multi-series-multistep scenario. We are currently writing the documentation, but the code is already available.

@JavierEscobarOrtiz and @fernando-carazo Let's investigate this further to see if we can extend the modeling approaches.

KishManani commented 5 months ago

Hi @JavierEscobarOrtiz and @JoaquinAmatRodrigo! I'm referring to something a bit simpler here. In ForecasterMultiSeries the time series ID is used as a feature to distinguish between time series. Could it be useful to use something similar in ForecasterMultivariate which would allow training a single model for all the series - here is an example of what the training matrix would look like (prior to encoding the time series id):

image

I've not tried this before but am curious to know what you think! Perhaps a tree-based model could effectively use the time series id in this case to partition the data into series 1 and series 2 early on in the tree and then learn separate behaviours further down the tree - just thinking out loud here. Linear regression would likely struggle for something like this.

Thanks, Kishan