Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.98k stars 342 forks source link

ValueError: `futr_df` must have one row per id and ds in the forecasting horizon (20) #1018

Closed AlineLPL closed 3 months ago

AlineLPL commented 4 months ago

What happened + What you expected to happen

I'm having an issue generating the prediction. It seems that the model isn't recognizing the number of unique IDs in my dataframe. However, I've checked the training set, and it has 6146 unique IDs, which matches the number of records in my future dates dataframe: 6146 * 20 date records = 122920 total records. The dates are continuous, and the first date in this dataframe immediately follows the end of the training set. For example, the training dataframe ends with the date 2024-04-17, and the first date in the future dates dataframe is 2024-04-18. Here's an example of how it looks grouped:

ds 2024-04-18 6146 2024-04-19 6146 2024-04-20 6146 2024-04-21 6146 2024-04-22 6146 2024-04-23 6146 2024-04-24 6146 2024-04-25 6146 2024-04-26 6146 2024-04-27 6146 2024-04-28 6146 2024-04-29 6146 2024-04-30 6146 2024-05-01 6146 2024-05-02 6146 2024-05-03 6146 2024-05-04 6146 2024-05-05 6146 2024-05-06 6146 2024-05-07 6146 Name: unique_id, dtype: int64

Versions / Dependencies

My library version is: neuralforecast 1.6.4

Reproduction script

here is my script:

from neuralforecast.auto import NHITS,TFT,LSTM from neuralforecast.core import NeuralForecast from neuralforecast.losses.pytorch import MQLoss

def entrena_modelo(h, set_entrena): exogenas = ['bandera_feriado','pagos_especiales_A_Q','pagos_especiales_D_Q','pagos_especiales_I','pagos_especiales_Q', 'dia_semana_Friday','dia_semana_Monday','dia_semana_Thursday','dia_semana_Tuesday','dia_semana_Wednesday' ] levels = [95] models = [ NHITS(h = horizon, input_size = 15, futr_exog_list = exogenas, # <- Future exogenous variables scaler_type = 'robust', learning_rate=1e-3, max_steps=200, val_check_steps=10, loss=MQLoss(level=levels)), LSTM(h = horizon, futr_exog_list = exogenas, # <- Future exogenous variables scaler_type = 'robust', learning_rate=1e-3, max_steps=200, val_check_steps=10, loss=MQLoss(level=levels)), TFT(h = horizon, input_size = 15, futr_exog_list = exogenas, scaler_type = 'robust', learning_rate=1e-3, max_steps=200, val_check_steps=10, loss=MQLoss(level=levels)) ] nf = NeuralForecast(models=models, freq='D') fcst_df = nf.cross_validation(df=set_entrena, val_size=20, test_size=20, n_windows=None) return nf,fcst_df

nf,fcst_df = entrena_modelo(h=horizon, set_entrena=set_final_var)

Y_hat_df = nf.predict(futr_df=final_result)

I tried this, just to be sure:

Y_hat_df=nf.predict(df=set_final_var[set_final_var['unique_id'].isin(final_result['unique_id'])], futr_df=final_result)

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 4 months ago

Hey @AlineLPL, thanks for using neuralforecast. Can you try upgrading? We recently introduced some debugging methods for this which might help you pin down the issue. My guess is that not all of your series end on the same date, and the futr_df must have the future for each serie.

AlineLPL commented 4 months ago

I have upgraded the library to version 1.7.2, and now I'm getting this error message:


ValueError Traceback (most recent call last) Input In [54], in <cell line: 1>() ----> 1 Y_hat_df = nf.predict(futr_df=final_result) 2 Y_hat_df.head() File /var/sds/packages/lib/python3.9/site-packages/neuralforecast/core.py:751, in NeuralForecast.predict(self, df, static_df, futr_df, sort_df, verbose, engine, **data_kwargs) 749 expected_cmd = "make_future_dataframe(df)" 750 missing_cmd = "get_missing_future(futr_df, df)" --> 751 raise ValueError( 752 "There are missing combinations of ids and times in futr_df.\n" 753 f"You can run the {expected_cmd} method to get the expected combinations or " 754 f"the {missing_cmd} method to get the missing combinations." 755 ) 756 if futr_orig_rows > futr_df.shape[0]: 757 dropped_rows = futr_orig_rows - futr_df.shape[0] ValueError: There are missing combinations of ids and times in futr_df. You can run the make_future_dataframe() method to get the expected combinations or the get_missing_future(futr_df) method to get the missing combinations.

But I have already compared the number of unique_ids in my training set versus my future dates set, and they have the same quantity. Additionally, for each id, I have 20 dates. I'm running it on an environment similar to AWS, and when I run it on Colab, I don't have any issues. What other option could Iexplore ? :'(

jmoralez commented 4 months ago

You can run any of the suggested commands to either get the expected structure or get the combinations that you're missing. That should help you narrow down the issue.

github-actions[bot] commented 3 months ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.