cchallu / n-hits

170 stars 25 forks source link

Clarification regarding data normalization #6

Open JiahuiSophieHU opened 2 years ago

JiahuiSophieHU commented 2 years ago

Hello,

I was trying to run N-HiTS with my own data using the shared colab

I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

def normalize(df_csv, df_train): result = df_csv.copy() columns_names = list(df_csv.columns) for feature_name in columns_names[1:]: result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std() return result

My function return different result comparing to yours: date HUFL 2016-07-01 00:00:00 0.126520 2016-07-01 00:15:00 -0.023339 2016-07-01 00:30:00 -0.098268 2016-07-01 00:45:00 -0.431177 2016-07-01 01:00:00 -0.231432 Name: HUFL, dtype: float64

and yours: unique_id | ds | y HUFL | 2016-07-01 00:00:00 | -0.041413 HUFL | 2016-07-01 00:15:00 | -0.185467 HUFL | 2016-07-01 00:30:00 | -0.257495 HUFL | 2016-07-01 00:45:00 | -0.577510 HUFL | 2016-07-01 01:00:00 | -0.385501

Can you please tell me more about the data normalization process?

Thanks and regards,

Sophie