JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
1.08k stars 122 forks source link

ngboost #487

Closed amiroft closed 4 months ago

amiroft commented 1 year ago

is it possible to use ngboost as a regressor for forecaster? if yes how should we define it?

JoaquinAmatRodrigo commented 1 year ago

Hi @amiroft, Yes, you can use ngboost as regressor since it has a .predict method. If you are using ngboost you may be interested in estimating the whole distribution for each forecasted step. If this is the case, you can not use the pred_dist method of ngboost, however, you may use the predict_dist from skforecast (https://skforecast.org/0.8.1/user_guides/probabilistic-forecasting.html#predicting-distribution-and-intervals-using-bootstrapped-residuals)

amiroft commented 1 year ago

that's good but ,when I tried

forecaster = ForecasterAutoreg( regressor=NGBRegressor(**hyperparameters), lags=[1, 24], ) forecaster.fit(data_train['Target0'], exog=data_train[exog]) boot_predictions = forecaster.predict_bootstrapping(exog = data_val[exog],steps=steps, n_boot=n_boot)

it returns back `/usr/local/lib/python3.10/dist-packages/skforecast/ForecasterAutoregDirect/ForecasterAutoregDirect.py in fit(self, y, exog, store_in_sampleresiduals) 660 ) 661 else: --> 662 self.regressors[step].fit( 663 X = X_train_step, 664 y = y_train_step,

TypeError: NGBoost.fit() got an unexpected keyword argument 'y'`

JoaquinAmatRodrigo commented 1 year ago

I see, ngboost, does not use the same argument naming as the one defined by scikit-learn (y vs Y). Unfortunately, skforecast only allows regressors that follow the skcikitlearn API.

amiroft commented 1 year ago

yes, How can we have it in skforecast? should we change the API of that so that it would be compatible with skforecast? do you have any suggestions or guidance on it?

amiroft commented 1 year ago

if you think it is possible to have ngboost in skforecast I can help since I need it

JavierEscobarOrtiz commented 1 year ago

Hello @amiroft,

The main problem is that skforecast expects a regressor that follows the scikit-learn API. (X. y)

As you can see in the error, our fit method internally calls the regressor's fit method:

 self.regressors_[step].fit(
    X = X_train_step,
    y = y_train_step
)

But for NGBoost it should be: (note Y instead of y)

 self.regressors_[step].fit(
    X = X_train_step,
    Y = y_train_step
)

From my point of view, there are three options that can work:

if isinstance(self.regressor, <InsertNGBoostTYPE>):
   self.regressors_[step].fit(
      X = X_train_step,
      Y = y_train_step
  )
else:
   self.regressors_[step].fit(
      X = X_train_step,
      y = y_train_step
  )

Hope it helps!

amiroft commented 1 year ago

Thanks for the response actually I did this writing wrapper around ngboost so that it be similar to sklearn with this

import pandas as pd from ngboost import NGBRegressor from sklearn.base import BaseEstimator, RegressorMixin from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from sklearn.preprocessing import OneHotEncoder from sklearn.impute import SimpleImputer

class NGBoostWrapper(BaseEstimator, RegressorMixin): def init(self, n_estimators=100, learning_rate=0.01, random_state=123): self.model = NGBRegressor(n_estimators=n_estimators, learning_rate=learning_rate, random_state=random_state)

  def fit(self, X, y):
      self.model.fit(X, y)
      return self

  def predict(self, X):
      return self.model.predict(X)

  def set_params(self, **params):
      self.model.set_params(**params)
      return self

  def get_params(self, deep=True):
      return self.model.get_params(deep)

but as you know ngboost is to predict probability not point in ngboost lib .predict method is used to predict point the method there is pred_dist do you have any idea on how to have that method inside skforecast with this wrapper? is it possible?

JoaquinAmatRodrigo commented 1 year ago

Hi @amiroft

The main issue is that the recursive multi-step forecasting process indeed requires point estimates at each step. To address this, we would need to implement two prediction processes. Firstly, we would need to save the predicted parameters at each step. Secondly, we would utilize these predicted parameters to obtain a point estimate that can serve as a predictor for the next step. Integrating this behavior seamlessly with the rest of the skforecast functionalities might prove challenging.

However, we are open to exploring new ideas and potential contributions to overcome this limitation. We welcome any suggestions or contributions that can help enhance the integration of ngboost models within the skforecast framework.

It's worth mentioning that the predict_dist method in skforecast is inspired by the idea of predicting distributions from ngboost. Although the inference process differs, it might be beneficial for you to give it a try and assess its suitability for your specific use case.

Please feel free to share any further thoughts or ideas, and don't hesitate to reach out if you need any further assistance.

edgBR commented 1 year ago

Hi,

If skforecast is scikit learn compatible you might try with an standard boosting algorithm and use MAPIE instead:

https://github.com/scikit-learn-contrib/MAPIE