Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
841 stars 80 forks source link

In predict, ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat #243

Closed SyedKumailHussainNaqvi closed 10 months ago

SyedKumailHussainNaqvi commented 10 months ago

after training the model using fit, when i predict then show this error "ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat"

here is my code, kindly guide me. model1 = [xgb.XGBRegressor()] mlf = MLForecast(models=model1, freq='5min', lags=[1,12], lag_transforms={1: [expanding_mean],12: [(rolling_mean, 12)] }, target_transforms=[Differences([1]),LocalStandardScaler()], date_features=["year", "month", "day", "hour"])

train = df[df.ds<='2016-05-15 18:30:00'] valid =df[(df.ds > '2016-06-15 18:30:00') & (df.ds < '2016-06-16 18:30:00')] test = df[df.ds >='2016-06-11 06:55:00'] mlf.fit(train,
fitted=True, id_col='unique_id', prediction_intervals=PredictionIntervals(n_windows=5, h=5, method="conformal_distribution" ) ,static_features=[] )

valid['unique_id']= valid.index valid['unique_id'] = df['unique_id'].astype('float64') valid.info() Data columns (total 10 columns):

Column Non-Null Count Dtype


0 ds 280 non-null datetime64[ns] 1 y 280 non-null float64
2 Current_Phase_Average 280 non-null float64
3 Weather_Temperature_Celsius 280 non-null float64
4 Weather_Relative_Humidity 280 non-null float64
5 Global_Horizontal_Radiation 280 non-null float64
6 Diffuse_Horizontal_Radiation 280 non-null float64
7 Wind_Speed 280 non-null float64
8 Wind_Direction 280 non-null float64
9 unique_id 280 non-null float64
dtypes: datetime64ns, float64(9)

forecast_df = mlf.predict(h=5, X_df= valid)

ValueError Traceback (most recent call last) ---->----> [1] forecast_df = mlf.predict(h=5, X_df= valid) 162 new_args.append(kwargs.pop(arg_names[i])) 163 new_args.append(kwargs.pop(old_name)) --> 164 return f(*newargs, **kwargs) 583 else: 584 ts = self.ts --> 586 forecasts = ts.predict( 587 models=self.models, 588 horizon=h, 589 dynamic_dfs=dynamic_dfs, 590 before_predict_callback=before_predict_callback, 591 after_predict_callback=after_predict_callback, 592 X_df=X_df, 593 ids=ids, 594 ) 595 if level is not None: 596 if self._cs_df is None: ... -> 1401 raise ValueError(msg) 1403 # datetimelikes must match exactly 1404 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):

ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat

jmoralez commented 10 months ago

Hey @SyedKumailHussainNaqvi, I believe the following line is the problem: valid['unique_id'] = df['unique_id'].astype('float64') You should use the same types for your ids in train and validation.

SyedKumailHussainNaqvi commented 10 months ago

@jmoralez Thank for your reply, i change the types of 'unique_id' in all Train, Validation, Test. for train series <class 'pandas.core.frame.DataFrame'> Index: 6276 entries, 0 to 6275 Data columns (total 10 columns):

Column Non-Null Count Dtype


0 ds 6276 non-null datetime64[ns] 1 y 6276 non-null float64
2 Current_Phase_Average 6276 non-null float64
3 Weather_Temperature_Celsius 6276 non-null float64
4 Weather_Relative_Humidity 6276 non-null float64
5 Global_Horizontal_Radiation 6276 non-null float64
6 Diffuse_Horizontal_Radiation 6276 non-null float64
7 Wind_Speed 6276 non-null float64
8 Wind_Direction 6276 non-null float64
9 unique_id 6276 non-null float64
dtypes: datetime64ns, float64(9)

for validation <class 'pandas.core.frame.DataFrame'> Index: 3640 entries, 6276 to 9915 Data columns (total 10 columns):

Column Non-Null Count Dtype


0 ds 3640 non-null datetime64[ns] 1 y 3640 non-null float64
2 Current_Phase_Average 3640 non-null float64
3 Weather_Temperature_Celsius 3640 non-null float64
4 Weather_Relative_Humidity 3640 non-null float64
5 Global_Horizontal_Radiation 3640 non-null float64
6 Diffuse_Horizontal_Radiation 3640 non-null float64
7 Wind_Speed 3640 non-null float64
8 Wind_Direction 3640 non-null float64
9 unique_id 3640 non-null float64
dtypes: datetime64ns, float64(9)

for Test Series Index: 2800 entries, 9916 to 12715 Data columns (total 10 columns):

Column Non-Null Count Dtype


0 ds 2800 non-null datetime64[ns] 1 y 2800 non-null float64
2 Current_Phase_Average 2800 non-null float64
3 Weather_Temperature_Celsius 2800 non-null float64
4 Weather_Relative_Humidity 2800 non-null float64
5 Global_Horizontal_Radiation 2800 non-null float64
6 Diffuse_Horizontal_Radiation 2800 non-null float64
7 Wind_Speed 2800 non-null float64
8 Wind_Direction 2800 non-null float64
9 unique_id 2800 non-null float64
dtypes: datetime64ns, float64(9)

But now i am face this error, ValueError: Found missing inputs in X_df. It should have one row per id and date for the complete forecasting horizon.

jmoralez commented 10 months ago

Closing in favor of #242