LarsBentsen / FFTransformer

Multi-Step Spatio-Temporal Forecasting: https://authors.elsevier.com/sd/article/S0306-2619(22)01822-0
60 stars 16 forks source link

About the loss of result transformed into its original form #9

Closed stlusor closed 1 year ago

stlusor commented 1 year ago

Hello! I find your repo interesting and would like to learn to debug it properly.

My question is this: When using graph structure, the results of training set, validation set and test set have good results for MAE and MSE. The final requirement is satisfied. But this is the result of scaled data. When converting the data back to its original form, i.e. pred_un, the MAE and MSE for each node are unacceptably bad (Become too high) . The final prediction results obtained are also not satisfactory.

With GraphMLP this does not happen and all results are ideal:(MLP Result)

image

But with GraphXXX (Transformer architecture's or LSTM model), the results are poor:(LSTM Result) similar to Transformer models)

image

As to results of Transformer models:

image

I would like to know your opinion and suggestions on this. Thank you.

LarsBentsen commented 1 year ago

Hi! I'm glad that you've found our repo interesting. From the results for the GraphTransformer and LSTM models, something indeed looks wrong. One thing that looks strange is the periodic/repeating patterns of the bottom two predictions. Have you checked that the prediction lengths are the same for all plots (and what are the prediction lengths you've used)? However, knowing what's wrong from the limited information is somewhat difficult as I did not experience the same problems. I have a few questions:

  1. For the GraphXXX results (bottom two images), do you see the same problems for the scaled data?
  2. Have you checked that you're checkpointing the models correctly and that the inputs/outputs are correct and the same for all models?
  3. Are you using a large dataset downloaded using the Frost API or only the trivial data example uploaded to GitHub?

I apologise for the limited insights as it was difficult to provide further meaningful insights with the current information.

stlusor commented 1 year ago

Hello!

I've tried practically all of them, and only the MLP model runs with better results. All other Transformer models and LSTMs are having such strange periodic results. I am using the same step size for all of them, 32 steps for input and 1 step for prediction.

Answer:

  1. I am not sure what you mean by scaling the data. All scaling and inverse operation are done with StandardScaler (original settings), but I found that even though the MAE and MSE results for training and testing are small (around 0.5), the final conversion back to individual stations becomes abnormally large (the MSE for _un can even be as high as 80)

  2. I don't have a serious understanding of the checkpoints, I trained each of the conditions from the start to the end of the test.

  3. I am using my own dataset. But I think if the MLP model can train to a better result, it means the dataset should be able to be trained properly with other models. I observe that the difference between the MLP model and other models should lie in the graph embedding of the Encoder, could it be related to my question?

I haven't actually modified the code too much, mainly focusing on the data loader and exp_main, have you considered the modifications that changing the dataset would require for the models?

LarsBentsen commented 1 year ago

Looking at the results, I find it very strange that the periodic patterns appear, as the periods are much longer than the prediction length of one. To me, it seems like there might be something wrong with the inputs/outputs, prediction lengths, scaling or checkpointing. Have you made sure that you're feeding all the (and correct) input features to the models and that you're plotting the correct features, i.e. the predicted wind speed? It seems from the plots that you might not be giving the correct input features to the models as the outputs seem independent of the inputs. I would advise you to debug through the embedding and etc. to see the actual features fed as inputs to different parts of the model and see if anything is different for the GraphMLP model compared to the GraphXXX models. It could also be an idea to try to reduce the dimensionality, number of layers and number of input features to see how this affects the results. Also, 0.5 seems quite high for MSE even on scaled data, which might indicate that it is not the inverse scaling that's causing the issue.

I apologise again that I'm not able to fully answer your question. However, it is difficult to know exactly what's wrong with the available information and without the possibility to debug your code for myself.