Closed aaron9980 closed 2 months ago
Hey. I'm able to run the notebook end to end as-is. Are you also using kaggle? If you're not, where are you getting the data from?
get_missing_future() gives the wrong missing data as there are no missing data and the missing data are past dates
What that does is take the end dates for each id with something like: long.groupby('id')['date'].max()
and apply the freq
offset (from the constructor) to build the dates in the forecasting horizon, so if that's producing dates that are in the past it means that your long
df doesn't have them.
Hi yes I am using kaggle and the data from M5 competition. I ran it from end to end it gave me the same error. The only changes I made are installing coreforecast version 0.0.3 as it kept giving the error ( AttributeError: module 'coreforecast.lag_transforms' has no attribute 'BaseLagTransform') and removing the input path for all loading data functions as my files are located in the same directory as the notebook. I appreciate the reply and I'll look into what went wrong when creating the long data df.
Hi according to your reply when get_missing_future() returns a date in a past, it means that long df does not have them. However here from my screenshots the long df contains the dates that get_missing_future() states are missing. Is there a reason for that? Further more the get_missing_future returns 796030 rows of missing data which to me is a lot considering X_df contains 853720 rows, same as what generates MLForecast.make_future_dataframe(h)
. In fact the date and Id are the same, just that X_df contains other variables.
Update: I attempted to do forecasting on one of the 'valid' products and the forecasting work, therefore i think that the issue isnt compatability but instead the problem probably occurred when processing the data. I will attempt to find a fix
We perform a join with the expected and the X_df. Is it possible that the ids in long and X_df have a different type?
Yep they have a different type. The id in long is category while the id in X_df is an object. The id in long has been a category data type since the start after sales is melted. Should the ID of X_df be converted to category?
Update: I converted the id of X_df to category and attempted to join it with the expected future dataframe and could not join it completely. I am trying to find out why joining them on these keys are not working
When I wanted to view the expected output for an apparent missing days for the product HOBBIES_2_132_CA_1_evaluation, the expected future function returns a past date, however my long_df does have these dates. Is there a reason why this happens? I think this happens to most of the products that are being forecaste. Sorry for the overwhelming questions I appreciate your time. May I know what version are you using? Maybe if I switched to your version it would work.
Are you able to share the notebook (either through kaggle or here)? I'm not able to reproduce the problem.
m5-mlforecast-eval.zip Heres the ZIP with the Notebook instead. Currently I'm on mlforecas version 0.11.2, however I installed it using local file as pip install could not find the older version.
I was able to reproduce the issue locally, but it seems to be due to my version of pandas, I was on 1.5.3 and upgrading to 2.2.2 fixed it. Can you try that? I'll still investigate what the source of the problem is for that version.
Hi, your answer solved my problem. I realised I had a way older version where my pandas had a version of 1.3.4. However after I updated it I had too much errors from other packages due to dependencies. Hence I reinstalled anaconda and it solved my issue. I'm curious why the issue occurred though. Anyways thanks for your help even though the issue was easy to fix.
Hey. This should be fixed by https://github.com/Nixtla/utilsforecast/pull/79, so you should be able to use pandas<2 with utilsforecast>=0.1.5.
I'm closing this, feel free to reopen if you encounter this issue again.
Had the same error. Turned out that my weekly data was aggregated to Monday and the model with frequency set to "W" aggregates dates to Sunday, so the fitting input X dates and forecast horizon did not match. Changed my input dataframe to be aggregated to SUnday date and everything worked.
What happened + What you expected to happen
Hi I'm trying to replicate the M5 forecast-eval notebook code but it gives the error message: ValueError: Found missing inputs in X_df. It should have one row per id and time for the complete forecasting horizon. You can get the expected structure by running
MLForecast.make_future_dataframe(h)
or get the missing combinatins in your currentX_df
by runningMLForecast.get_missing_future(h, X_df)
. I ran the above functions and found that the X_df is correct and the get_missing_future() gives the wrong missing data as there are no missing data and the missing data are past dates. I did not change any of the code from the M5 forecast eval code so I'm confused at what went wrong.Versions / Dependencies
mlforecast.version : 0.11.2 coreforecast vresion : 0.0.3 (Had to install this to avoid error when installing pip install -qqq "mlforecast[lag_transforms]") Python: 3.9.7 Windows OS (Running on Jupyter Notebook)
Reproduction script
%%time fcst.fit( long, id_col='id', time_col='date', target_col='y', static_features=['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'], )
%time preds = fcst.predict(28, X_df=X_df)
Issue Severity
High: It blocks me from completing my task.