JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
992 stars 113 forks source link

weired code in your guide... #734

Open hwoarang09 opened 5 days ago

hwoarang09 commented 5 days ago

https://cienciadedatos.net/documentos/py53-global-forecasting-models.html

for cluster in data['cluster_base_on_dtw'].unique():
    print(
        f"Training and testing model for cluster: {cluster} "
        f"(n = {data[data['cluster_base_on_dtw'] == cluster]['building_id'].nunique()})"
    )

    # Create subset based on DTW clusters
    data_subset = data[data['cluster_base_on_dtw'] == cluster]
    data_subset = data_subset.pivot_table(
                      index   = 'timestamp',
                      columns = 'building_id',
                      values  = 'meter_reading',
                      aggfunc = 'mean'
                  )
    data_subset.columns = data_subset.columns.astype(str)
    data_subset = data_subset.asfreq('D').sort_index()

    # Add calendar features
    data_subset = day_of_week_cyclical_encoding(data_subset)
    data_subset = data_subset.merge(
                      data[['air_temperature', 'wind_speed']].resample('D').mean(),
                      left_index  = True,
                      right_index = True,
                      how         = 'left',
                      validate    = '1:m'
                  )

    metric, predictions = backtesting_forecaster_multiseries(
                              forecaster         = forecaster,
                              series             = data_subset.filter(regex='^id_'),
                              exog               = data_subset[['sin_day_of_week', 'cos_day_of_week']],
                              initial_train_size = len(data_subset.loc[:end_validation, :]),
                              steps              = 7,
                              metric             = 'mean_absolute_error',
                              verbose            = False,
                              show_progress      = False
                          )
    predictions_all_buildings.append(predictions)
    metrics_all_buildings.append(metric)

end = datetime.now()

predictions_all_buildings = pd.concat(predictions_all_buildings, axis=1)
metrics_all_buildings = pd.concat(metrics_all_buildings, axis=0)
errors_all_buildings = predictions - data_pivot.loc[predictions.index, predictions.columns]
mean_metric_all_buildings = metric['mean_absolute_error'].mean()
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()

in this code, i think,

errors_all_buildings = predictions - data_pivot.loc[predictions.index, predictions.columns] is this right...????

errors_all_buildings = predictions_all_buildings - data_pivot.filter(regex='^id_').loc[predictions.index, :] i think this is right.

because,

is result of last cluster... when i check predictions - data_pivot.loc[predictions.index, predictions.columns] shape is wrong. i think. Can you check this..??? also, in results secstion, table will be different .
hwoarang09 commented 5 days ago

also...

mean_metric_all_buildings = metric['mean_absolute_error'].mean() i think this is wrong too...

metric is result of last cluster... mean_metric_all_buildings =metrics_all_buildings['mean_absolute_error'].mean() so this is right...maybe..

Can you check this..?

Really sorry if i misunderstand your guide...

hwoarang09 commented 5 days ago

when i change code, result is here newresult dtw is top method..maybe..

JoaquinAmatRodrigo commented 4 days ago

Hi @hwoarang09 You are right. Thanks for reporting the error! I will update the document later today.