carlomazzaferro / scikit-hts-examples

Example usage of scikit-hts
MIT License
53 stars 22 forks source link

How to pass in exogenous_df in predict()? #8

Closed PeggyFan closed 3 years ago

PeggyFan commented 3 years ago

Hi, I am following the hts documentation to add exogenous variables to a model.

So far I succeeded in using the hmv load_mobility_data(), creating exogenous features for each node: exogenous = {k: ['precipitation', 'temp'] for k in hmv.columns if k not →in ['precipitation', 'temp']} and pass it in

clf = HTSRegressor(model='prophet', revision_method='OLS',  n_jobs=10)
model = clf.fit(hmv, hier, exogenous=exogenous)

The model ran fine, but I can't figure out how to pass in the exogenous_df in the predict() function. The documentation says:

Parameters
• exogenous_df (pandas.DataFrame) – A dataframe of length == steps_ahead containing the exogenous data for each of the nodes

For the hmv data, the exogenous features were ["temperature", "precipitation"] so I passed in the data frame of 7 rows:

precipitation | temp
     -- | --
0.00000 | 77.00000
0.00000 | 74.00000
0.00000 | 66.00000
0.00000 | 68.00000
0.00000 | 68.00000
0.00000 | 64.00000
0.00000 | 65.00000

and ran preds = model.predict(steps_ahead=7, exogenous_df=exogenous_df)

But I'm not sure how to pass in the node information in the data frame. The hierarchy of the hmv data is:

{'total': ['CH', 'SLU', 'BT', 'OTHER'],
 'CH': ['CH-07', 'CH-02', 'CH-08', 'CH-05', 'CH-01'],
 'SLU': ['SLU-15', 'SLU-01', 'SLU-19', 'SLU-07', 'SLU-02'],
 'BT': ['BT-01', 'BT-03'],
 'OTHER': ['WF-01', 'CBD-13']}

The error I got (which is probably related to the missing node info in the above exogenous_df) is:

~/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/core/regressor.py in predict(self, exogenous_df, steps_ahead, distributor, disable_progressbar, show_warnings, **predict_kwargs)
    270         """
    271 
--> 272         steps_ahead = self.__init_predict_step(exogenous_df, steps_ahead)
    273         predict_function_kwargs = {'fit_kwargs': predict_kwargs,
    274                                    'steps_ahead': steps_ahead,

~/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/core/regressor.py in __init_predict_step(self, exogenous_df, steps_ahead)
    222 
    223     def __init_predict_step(self, exogenous_df: pandas.DataFrame, steps_ahead: int):
--> 224         if self.exogenous and not exogenous_df:
    225             raise MissingRegressorException(f'Exogenous variables were provided at fit step, hence are required at '
    226                                             f'predict step. Please pass the \'exogenous_df\' variable to predict '

~/analytics-etl/virtualenv/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
   1327 
   1328     def __nonzero__(self):
-> 1329         raise ValueError(
   1330             f"The truth value of a {type(self).__name__} is ambiguous. "
   1331             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can someone provide an example of exogenous_df for the predict() function?

Thank you!

PeggyFan commented 3 years ago

Made some progress: I replaced any if-statement that checks if exogenous_df exists in hts/core/regressor.py with exogenous_df is (not) None and solved the above error message.

Then I encountered:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/utilities/distribution.py", line 40, in _function_with_partly_reduce
    return list(results)
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/utilities/distribution.py", line 39, in <genexpr>
    results = (map_function(chunk, kwargs) for chunk in chunk_list)
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/core/utils.py", line 97, in _do_actual_predict
    model_instance = model_instance.predict(node=node,
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/hts/model/p.py", line 105, in predict
    self.forecast = self.model.predict(future)
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/fbprophet/forecaster.py", line 1174, in predict
    df = self.setup_dataframe(df.copy())
  File "/Users/pfan/analytics-etl/virtualenv/lib/python3.8/site-packages/fbprophet/forecaster.py", line 272, in setup_dataframe
    raise ValueError(
ValueError: Regressor 'precipitation' missing from dataframe

As shown in previous message, precipitation does exist in exogenous_df. So I got stuck here again. Do I need to pass more argument in the predict() function?

meenuravi18 commented 3 years ago

Hi, I am having the same issue when trying to use exogenous variables in the predict function. Were you able to resolve the issue? Thank you!

carlomazzaferro commented 3 years ago

I'm tracking this here: https://github.com/carlomazzaferro/scikit-hts/issues/55#issuecomment-834193602

Will update once progress is made. Currently tackling other issues, this one is next.

carlomazzaferro commented 3 years ago

https://github.com/carlomazzaferro/scikit-hts/pull/73 closing here, let's keep track there