Closed SyedKumailHussainNaqvi closed 11 months ago
Hey @SyedKumailHussainNaqvi. Is the frequency of your series 5 minutes?
@jmoralez Yes, my data series frequency is 5 minutes, kindly check this.. timestamp | Active_Power | Current_Phase_Average | Weather_Temperature_Celsius | Weather_Relative_Humidity | Global_Horizontal_Radiation | Diffuse_Horizontal_Radiation | Wind_Speed | Wind_Direction |
---|---|---|---|---|---|---|---|---|
4/1/2016 8:55 | 102.1753 | 142.1885 | 24.47751 | 23.65278 | 498.1442 | 46.48696 | 3.221892 | 205.7688 |
4/1/2016 9:00 | 105.4211 | 147.5435 | 24.96117 | 23.06787 | 514.6549 | 45.32268 | 3.747602 | 122.7799 |
4/1/2016 9:05 | 108.4093 | 152.4971 | 25.13794 | 22.7556 | 536.3303 | 49.34752 | 3.553297 | 157.5239 |
4/1/2016 9:10 | 111.1481 | 157.1644 | 25.4412 | 22.6231 | 548.3616 | 46.07468 | 2.969267 | 109.3303 |
4/1/2016 9:15 | 113.9153 | 161.9033 | 25.80492 | 22.19402 | 553.215 | 45.70579 | 3.401344 | 142.9177 |
4/1/2016 9:20 | 116.4639 | 165.9414 | 25.93542 | 21.78914 | 568.1932 | 48.49394 | 3.898342 | 167.0238 |
4/1/2016 9:25 | 119.0665 | 170.5314 | 25.80478 | 21.97729 | 587.751 | 50.5708 | 3.913895 | 128.7222 |
4/1/2016 9:30 | 121.7267 | 175.1911 | 26.32808 | 21.42587 | 604.8317 | 49.5583 | 4.04404 | 90.76111 |
4/1/2016 9:35 | 124.4063 | 179.5768 | 26.7749 | 21.0956 | 621.0855 | 48.98398 | 2.631776 | 194.7342 |
4/1/2016 9:40 | 126.924 | 183.7551 | 27.03691 | 20.37064 | 637.8745 | 49.20296 | 3.320363 | 67.64616 |
4/1/2016 9:45 | 129.3882 | 187.7454 | 27.38554 | 19.96036 | 656.485 | 51.33382 | 3.324487 | 134.8194 |
4/1/2016 9:50 | 131.626 | 191.4583 | 27.71058 | 19.74454 | 671.711 | 50.71244 | 2.800914 | 144.7963 |
4/1/2016 9:55 | 133.8582 | 195.2888 | 28.35464 | 19.16202 | 685.3091 | 48.57825 | 2.220385 | 121.1364 |
4/1/2016 10:00 | 136.06 | 198.774 | 28.39381 | 18.71133 | 702.8945 | 51.30513 | 3.279328 | 132.584 |
4/1/2016 10:05 | 137.8313 | 201.848 | 28.58586 | 18.68137 | 716.3569 | 51.98991 | 2.730431 | 93.9258 |
4/1/2016 10:10 | 139.678 | 204.9987 | 29.00494 | 18.10191 | 730.3823 | 52.61768 | 2.785331 | 124.0541 |
4/1/2016 10:15 | 141.4433 | 207.9873 | 28.79097 | 18.06579 | 740.2993 | 54.5131 | 2.611753 | 118.2188 |
4/1/2016 10:20 | 142.9773 | 210.5444 | 28.58105 | 18.09788 | 755.9532 | 57.29346 | 2.954512 | 168.503 |
4/1/2016 10:25 | 144.6015 | 213.3357 | 29.21808 | 17.5524 | 767.1425 | 55.70409 | 2.355037 | 125.6492 |
4/1/2016 10:30 | 145.9677 | 215.7586 | 29.67789 | 16.83503 | 781.8422 | 57.83193 | 2.358226 | 142.6517 |
4/1/2016 10:35 | 147.2709 | 218.0466 | 29.55465 | 16.96823 | 796.9732 | 61.30234 | 3.018474 | 146.7362 |
4/1/2016 10:40 | 148.7956 | 220.4369 | 29.61863 | 16.85245 | 811.0737 | 62.71655 | 2.493661 | 133.6768 |
4/1/2016 10:45 | 149.5706 | 222.0094 | 29.78173 | 17.16988 | 821.0898 | 63.57846 | 1.97588 | 264.2258 |
4/1/2016 10:50 | 151.0038 | 224.2357 | 29.65329 | 17.13947 | 829.7411 | 59.55241 | 1.933785 | 218.3842 |
4/1/2016 10:55 | 152.1304 | 226.2266 | 29.97485 | 16.63404 | 838.7658 | 57.79825 | 2.962153 | 196.2876 |
4/1/2016 11:00 | 153.6104 | 228.596 | 29.96729 | 16.53452 | 849.1298 | 56.9832 | 2.532594 | 66.07544 |
4/1/2016 11:05 | 153.7445 | 229.2958 | 30.40237 | 16.02146 | 854.5871 | 53.07868 | 1.374275 | 226.616 |
4/1/2016 11:10 | 154.7077 | 231.1436 | 30.74542 | 15.66695 | 868.4718 | 55.00143 | 2.591314 | 231.6353 |
4/1/2016 11:15 | 155.9508 | 233.1486 | 30.60402 | 15.48819 | 884.5474 | 56.15897 | 2.011981 | 177.2232 |
What that's doing is verifying that it gets the expected ids and dates in X_df
. You can replicate the check by using:
dates_validation = pd.DataFrame({
model.ts.id_col: model.ts.uids,
"_start": model.ts.last_dates + model.ts.freq,
"_end": model.ts.last_dates + horizon * model.ts.freq,
})
Can you verify if those dates and ids match the ones you're providing through valid
?
@jmoralez Thank you so much for your kind guidance... i run the above code its output is below:
| unique_id | _start | _end 1.0 | 2016-05-15 18:35:00 | 2016-05-15 18:05:00
but my Data Series start from 06:55 and end at 18:30. then what should i do now?
Those dates are built based on the last time it saw during training. Do you have missing timestamps?
this is my model and fit code kindly review this. models = [XGBRegressor(random_state=0, n_estimators=100)] model = MLForecast(models=models, freq='5T', num_threads=6)
model.fit(train, id_col='unique_id', time_col='ds', target_col='y',fitted=True, static_features=[])
and train data series is this
ds | y | Current_Phase_Average | Weather_Temperature_Celsius | Weather_Relative_Humidity | Global_Horizontal_Radiation | Diffuse_Horizontal_Radiation | Wind_Speed | Wind_Direction | unique_id |
---|---|---|---|---|---|---|---|---|---|
2016-04-01 08:55:00 | 102.175270 | 142.188522 | 24.477514 | 23.652782 | 498.144226 | 46.486958 | 3.221892 | 205.768753 | 1.0 |
2016-04-01 09:00:00 | 105.421097 | 147.543472 | 24.961168 | 23.067873 | 514.654907 | 45.322678 | 3.747602 | 122.779907 | 1.0 |
2016-04-01 09:05:00 | 108.409271 | 152.497131 | 25.137936 | 22.755598 | 536.330322 | 49.347523 | 3.553297 | 157.523910 | 1.0 |
2016-04-01 09:10:00 | 111.148140 | 157.164398 | 25.441204 | 22.623100 | 548.361572 | 46.074684 | 2.969267 | 109.330276 | 1.0 |
2016-04-01 09:15:00 | 113.915314 | 161.903259 | 25.804924 | 22.194019 | 553.215027 | 45.705791 | 3.401344 | 142.917725 | 1.0 |
....... ....... ...... 2016-05-15 18:10:00 | 0.000000 | 7.203154 | 23.014162 | 24.857141 | 6.644276 | 4.979178 | 1.898691 | 135.060287 | 1.0 2016-05-15 18:15:00 | 0.000000 | 6.224843 | 22.699066 | 25.454231 | 5.833419 | 4.189736 | 1.728728 | 121.308731 | 1.0 2016-05-15 18:20:00 | 0.000000 | 6.142416 | 22.338396 | 26.381437 | 4.928545 | 3.340665 | 1.659863 | 118.639999 | 1.0 2016-05-15 18:25:00 | 0.000000 | 6.142416 | 22.088547 | 26.944998 | 4.767952 | 3.199787 | 1.453017 | 120.521439 | 1.0 2016-05-15 18:30:00 | 0.000000 | 6.142416 | 21.587543 | 28.438608 | 5.341975 | 3.628179 | 1.357174 | 113.915932 | 1.0
So valid
should start at "2016-05-15 18:35:00", which is what the check is verifying, why does yours start at a different timestamp?
This is PV Dataset and i am forecasting the Ultra-short-term photovoltaic power prediction after one hour and the frequency of the dataset is 5 minutes, because the power output of the photovoltaic modules is significantly lower in the morning and evening, that is, it is 0 or close to 0 most of the time. Therefore, only the power between 6:55 and 18:30 is considered. that why in valid dataset start at "2016-05-16 06:55:00" next day.
I change the train dataset split and now its end at "2016-05-15 18:00:00", and the is dates validation is below unique_id | _start | _end 1.0 | 2016-05-15 18:05:00 | 2016-05-15 18:25:00 now predict function execute successfully but only predict the next 5 values of validation dataset as below unique_id | ds | XGBRegressor | y 1.0 | 2016-05-15 18:05:00 | 0.353545 | 0.0 1.0 | 2016-05-15 18:10:00 | 0.159547 | 0.0 1.0 | 2016-05-15 18:15:00 | 0.407333 | 0.0 1.0 | 2016-05-15 18:20:00 | 0.192743 | 0.0 1.0 | 2016-05-15 18:25:00 | 0.492636 | 0.0 but my validation dataset length is 3646 rows × 10 columns, kindly guide me how can i predict the remaining values of valid dataset?
The number of predictions is controlled by the h
argument of MLForecast.predict
. If your dates are successive you should be able to just use a bigger h
, e.g. h=3646
.
when i set h=3646 same ValueError "Found missing inputs in X_df. It should have one row per id and date for the complete forecasting horizonFound missing inputs in X_df. It should have one row per id and date for the complete forecasting horizon" error also come. Please guide me how can i predict model on valid dataset?
Are you including the missing timestamps in the training set? If you're not you can just use integer timestamps instead, e.g.
data['timestamp'] = data.sort_values(['unique_id', 'timestmap']).groupby('unique_id').cumcount()
model = MLForecast(models=models,
freq=1, # this will advance each timestamp by 1 when predicting
lags=[12],
lag_transforms={
1: [(rolling_mean, 12), (rolling_max, 12), (rolling_min, 12)],
},
# date_features=['dayofweek', 'month'], # you can't use date features anymore
num_threads=6)
@jmoralez Thank you a lot for your kind response & guidance and so sorry for the late reply.
i am training my dataset on XGBoost model for time series Predication, but after fitting model on train dataset but predict function not predict on validation dataset,kindly guide me how can i solve this?
Dataset Info Active_Power | Current_Phase_Average | Weather_Temperature_Celsius | Weather_Relative_Humidity | Global_Horizontal_Radiation | Diffuse_Horizontal_Radiation | Wind_Speed | Wind_Direction
102.175270 | 142.188522 | 24.477514 | 23.652782 | 498.144226 | 46.486958 | 3.221892 | 205.768753 105.421097 | 147.543472 | 24.961168 | 23.067873 | 514.654907 | 45.322678 | 3.747602 | 122.779907 108.409271 | 152.497131 | 25.137936 | 22.755598 | 536.330322 | 49.347523 | 3.553297 | 157.523910 111.148140 | 157.164398 | 25.441204 | 22.623100 | 548.361572 | 46.074684 | 2.969267 | 109.330276 113.915314 | 161.903259 | 25.804924 | 22.194019 | 553.215027 | 45.705791 | 3.401344 | 142.917725 0.000000 | 6.104828 | 0.000000 | 4.234369 | 0.377419 | 0.425231 | 0.450622 | 0.000000 0.000000 | 6.103529 | 0.000000 | 4.234369 | 0.377419 | 0.425231 | 0.450622 | 0.000000 0.000000 | 6.100151 | 0.000000 | 4.234369 | 0.377419 | 0.425231 | 0.450622 | 0.000000 0.000000 | 6.101910 | 0.000000 | 4.234369 | 0.377419 | 0.425231 | 0.450622 | 0.000000 0.000000 | 6.096731 | 0.000000 | 4.234369 | 0.377419 | 0.425231 | 0.450622 | 0.000000
code for training and predication data = df2.reset_index()[['timestamp', 'Active_Power', 'Current_Phase_Average', 'Weather_Temperature_Celsius' ,'Weather_Relative_Humidity' ,'Global_Horizontal_Radiation', 'Diffuse_Horizontal_Radiation' ,'Wind_Speed' ,'Wind_Direction']] data.index = pd.Index(np.repeat(0, data.shape[0]), name='unique_id') data.reset_index(inplace=True) df = data.sort_values(['unique_id', 'timestamp']).groupby('unique_id',as_index=False).apply(lambda x: x.fillna(method='ffill')) train = df.loc[df['timestamp'] < '2016-06-01'] valid = df.loc[(df['timestamp'] >= '2016-06-01') & (df['timestamp'] < '2016-06-30')] models = [XGBRegressor(random_state=0, n_estimators=100)] model = MLForecast(models=models, freq='5T', lags=[12], lag_transforms={ 1: [(rolling_mean, 12), (rolling_max, 12), (rolling_min, 12)], }, date_features=['dayofweek', 'month'], num_threads=6) model.fit(train, id_col='unique_id', time_col='timestamp', target_col='Active_Power',fitted=True, static_features=[]) p = model.predict(horizon=5,X_df=valid)
ValueError Traceback (most recent call last)
--> 586 forecasts = ts.predict( 587 models=self.models_, 588 horizon=h, 589 dynamic_dfs=dynamic_dfs, 590 before_predict_callback=before_predict_callback, 591 after_predict_callback=after_predict_callback, 592 X_df=X_df, 593 ids=ids, 594 ) 595 if level is not None: 596 if self._cs_df is None: ... 601 columns=[self.id_col, self.time_col, "_start", "_end"] 602 ) 603 if getattr(self, "max_horizon", None) is None:
ValueError: Found missing inputs in X_df. It should have one row per id and date for the complete forecasting horizon