LarsBentsen / FFTransformer

Multi-Step Spatio-Temporal Forecasting: https://authors.elsevier.com/sd/article/S0306-2619(22)01822-0
70 stars 18 forks source link

The issue of LSTM having higher accuracy than Transformer models such as FFTranformer for their own data #12

Closed liufeng0612 closed 11 months ago

liufeng0612 commented 11 months ago

Hello, I'm so sorry to let you know that as a beginner, I have two questions that I would like your guidance on. I used my data to predict hydrological flow but encountered the following situation in "results'loss. txt": Firstly, the first question is:

(1)For LSTM's "results loss. txt" test_LSTM_Wind_ftM_sl64_ll48_pl6_0 results_loss.txt mae_sc:5.597444 mse_sc:182.20319 rmse_sc:13.498266 mape_sc:1.3064296 mspe_sc:26.942993 nse_sc:0.7959154397249222 : mae_un:0.2701739283875421 mse_un:0.42448595363290736 rmse_un:0.6515258656668262 mape_un:0.2690037580031805 mspe_un:0.17745132014937723 nse_un:0.7959154261343111

(2)For FFTransformer's "results'loss. txt" mae_sc:8.615773 mse_sc:271.36246 rmse_sc:16.473083 mape_sc:2.5213554 mspe_sc:113.46704 nse_sc:0.6960487365722656 : mae_un:0.4158607560937834 mse_un:0.6322038262267812 rmse_un:0.7951124613705794 mape_un:0.6027701334935149 mspe_un:0.7837646949064655 nse_un:0.6960487211236938 Obviously, for both real MAE and standardized MAE, the accuracy of LSTM model is significantly better than that of FFTranformer model. May I ask why this is? Using your Wind sample data, this situation did not occur.

The second question, regarding MAE_ The problem with large values such as sc is suspected to be caused by problems during data standardization

And then I was interested in Data_ Scale of loader.py: making changes

if self.scale:
            self.cols_meas = df_data.stack().columns
            df_dataa = self.scaler.fit_transform(df_data.values)
            df_standardized = pd.DataFrame(df_dataa, columns=df_data.columns, index=df_data.index)

            # if self.flag != 'train':
            #     self.scaler.fit_transform(train_data.values)
            #     #self.scaler.fit_transform(df_data.stack().values)
            #     # df_data = self.scaler1.transform(df_data.values)
            #     del train_data      # Free up memory as train_data is no longer needed.
            # else:
            #     df_dataa = self.scaler.fit_transform(df_data.values)
            #     df_standardized = pd.DataFrame(df_dataa, columns=df_data.columns, index=df_data.index)
                #self.scaler.transform(df_data.stack().values)
                # df_data = self.scaler1.transform(df_data.values)  #fit_transform

            # [Samples, meas, stations] 样品、测量、站 
            data = df_standardized.values.reshape(df_standardized.shape[0], df_standardized.columns.get_level_values(0).nunique(), -1) 

The "results loss. txt" of LSTM has become normal, but its accuracy has significantly improved. However, it is unclear whether this change is correct, and there has been a situation where LSTM accuracy is better than FFTranformer accuracy. (1)For LSTM's "results loss. txt" mae_sc:0.1663143 mse_sc:0.1583283 rmse_sc:0.39790487 mape_sc:0.6587583 mspe_sc:21.135744 nse_sc:0.8387525230646133 : mae_un:0.24206019158566544 mse_un:0.3353869053356169 rmse_un:0.5791259839927897 mape_un:0.2967014869452189 mspe_un:0.24158562295711283 nse_un:0.838752512923215

(2)For FFTransformer's "results'loss. txt" mae_sc:0.29207623 mse_sc:0.29233924 rmse_sc:0.54068404 mape_sc:1.2005281 mspe_sc:129.16577 : mae_un:0.5228261250201187 mse_un:0.9367189466469674 rmse_un:0.967842418292858 mape_un:2.4155604584621964 mspe_un:245.15301528264163 My sample data is as follows [Uploading wind_data.csv…]()

liufeng0612 commented 11 months ago

wind_data.csv

liufeng0612 commented 11 months ago

I am very eager to receive guidance from experts. Recently, I have been doing interpretable deep learning and would like to consult with you. For your code, should you start with interpretable GNN or interpretable Transformer?

Another question, is the GNN model you are using GCN?

I would appreciate your reply. As a beginner, if there is any research output, if you don't mind, you can be a co-author

LarsBentsen commented 11 months ago

Hi, sorry for the late response.

  1. Regarding your first question on why the LSTM performs better than the FFTransformer and Autoformer, it might not be a mistake. It is not certain that these models will outperform an LSTM for all domains/applications. Some things to consider however are to tune the parameters (e.g. dimensionality, sequence lengths, learning rate and etc.) for your particular applications to see if you can achieve better results.
  2. Considering the much larger MSE losses and the data scaling, it indeed seems like the MSE losses are very high. It is somewhat difficult to see whether the data scaling is correct, but one potential issue is that you seem to fit different scalers for train/test/val data. This is not generally good practice as the test data should be treated as unknown and you should therefore not fit a scaler on this data, but use the scaler obtained from the training data. I would advise you to try to perform some sanity checks for the scalers to ensure that you are actually scaling the data according to the different features and can correctly retrieve the original (unscaled) data.
  3. In terms of interpretability it is again a bit challenging to say whether GNN or Transformers are best for interpretability, as the question essentially asks whether it is most important to improve the interpretability of the time series or spatial learners. Both the vanilla Transformer and GNN/GAT are fairly easy to understand and a good first step to potentially improve interpretability is to reduce the dimensionality and number of heads and then analyse the latent attention weights to see which parts of the input the models learn to focus on (but these are just some thoughts of the top of my head).
  4. Finally, it seems like you only have three different spatial locations? If this is the case it might not be very efficient/necessary to use a GNN, but instead just use a sequence learner (like Autoformer/FFTransformer/LSTM in encoder or encoder-deocder settings). Here you would typically just concatenate the inputs from the different locations and feed as a single input the the sequence learner. I would advise you to try this to see if you actually get any improvements from using a GNN.

I hope this helps, but I'm sorry that I could not provide more specific answers as this requires more in-depth analysis. I'm very glad that you seem to have found the paper and repo interesting! :D