About LSTM model - Githubissues

LarsBentsen / FFTransformer

Multi-Step Spatio-Temporal Forecasting: https://authors.elsevier.com/sd/article/S0306-2619(22)01822-0

66 stars 16 forks source link

About LSTM model #8

Closed tkjUser closed 1 year ago

tkjUser commented 1 year ago

Hello, it's me again! Thanks for the patient answer before. I have two questions to ask you.

In the exp_main.py file, lines 96 and 97 of the code, dec_zeros takes the value of the last value repeated pred_len times in the past, while the original Autoformer codebase uses the value of 0 as the mask, why should the duplicate value of the last value be used as the mask?
In the forward() method of LSTM.py, according to the data division of the exp_main.py code, the composition of the input x_dec is: label_len past values, pred_len last repeat of past values. That is, x_dec does not contain future values. In the case of "mixed_teacher_forcing", the future values are needed as "teacher" for supervised learning, and the corresponding code in your code for "teacher_forcing" is: dec_inp = target[:, t, :] , which means that the last repetition of the past value is used as the supervised "teacher" instead of the future real value.Is this the right approach?

LarsBentsen commented 1 year ago

Hi! Thank you for your thorough investigation and for pointing this out!

It is true that for the placeholders to the decoder we use the persistence (last recorded) values instead of zero-values. I believe that when experimenting we found this to give slightly better results, which makes sense as the persistence values are likely to lie closer to the true targets than zero. For different applications, you might experiment with different placeholders.
This is indeed a mistake in the published code. For the teacher forcing, the true targets should be available for the LSTM model, while in the code here, it will only be given persistence values. When experimenting, we definitely used the true targets and not the persistence values as in the current code. I believe that when changing the code to publish we must have overlooked this. For the reported experiments we only use the graph-based models (without decoders) and I must have forgotten to properly change all the code bits in exp_main to still be the correct ones for the non-spatial encoder-decoder models. I will try to update exp_main with the correct implementation as soon as I find the time. Another thing to note is that when validating (in def vali(...)), I think that it makes more sense to never feed the true values (but only the predicted/persistence values) to better be able to compare the validation accuracies between models using different mixed teacher forcing ratios or for different epochs having different forcing ratios. Apologies again for not checking that this was fixed in the published version of the code! I will close this when I've updated the code.

LarsBentsen commented 1 year ago

Apologies for taking so long to get around to sorting this out. Only minor changes to exp_main.py and LSTM.py were required to resolve the issue. When transferring the code to this public repository I simply forgot to check the teacher forcing properly as it was not used for the final graph-based models that do not have decoders. I therefore missed some key functionality from my original code, feeding the true values batch_y to the LSTM instead of the placeholders. For validation on lines 96 and 97 of exp_main.py, we do not feed the true values as we do not want teacher forcing for validation and testing. I will close this issue as I believe it is resolved now, but please re-open it if you find any other bugs.