ant-research / Pyraformer

Apache License 2.0
264 stars 41 forks source link

Some question about loss function #8

Open LeeRiking opened 2 years ago

LeeRiking commented 2 years ago

In your code, batch_x is the same with batch_y. When calculating the MSE, pred and ture represent the 168 steps with shifting one step and the 168 steps with original data, so there are some question about data misalignment, and it is mean that the calculation of MSE is not alignment with original data and predition data with one step difference. Hope your ansser sincerely!

LeeRiking commented 2 years ago

The question is occured in single step forecast

Zhazhan commented 2 years ago

The batch_x and batch_y are not the same. Let's denote the history length as L_H and the prediction length as L_P. In order to perform rolling prediction, on a L_H+L_P window, we fetch L_H sequences from front to back and set the end of each sequence to -1 to prevent information leakage. These sequences are stacked in the batch dimension into batch_x, and batch_y only takes a L_P sequence from the end of the original window. Therefore, there is a one-to-one correspondence prediction and batch_y. For more details, please read the 'split' function in 'dataloader.py'.

LeeRiking commented 2 years ago

Thank you for your explaination!When debugging the code, i have another problem on the encoder layer,the default parameter of use_tvm is false, and whether it means that training model by ordinary multi-head attention. TVM is used to compilation models. tvm

Zhazhan commented 2 years ago

When use_tvm=False, we implement Pyraformer by adding an attention mask to the ordinary multi-head attention. Therefore, the setting of use_tvm does not affect the results, but does affect speed and memory usage.