I have a model which uses spatial attention with GAT and temporal attention with Transformers. I am working on Pems bay dataset for my project.
During training, the decoder is one shot meaning all the timesteps are fed into the decoder at the same time. When I printed out the predictions, they are all near the mean value and not capturing the trends in the data.
Is there any obvious issues that you see?
I checked position encoding, attention weights from graphs and normalised with Z-score normalisation and use inverse scaling before comparison.
Hi,
I guess your training set got overfitting, and validation set got underfitting. Maybe there are too much data to learn, but in other words, the model does not learn anything either.
All in all, too much noise!
Hello Jake
I have a model which uses spatial attention with GAT and temporal attention with Transformers. I am working on Pems bay dataset for my project.
During training, the decoder is one shot meaning all the timesteps are fed into the decoder at the same time. When I printed out the predictions, they are all near the mean value and not capturing the trends in the data.
Is there any obvious issues that you see?
I checked position encoding, attention weights from graphs and normalised with Z-score normalisation and use inverse scaling before comparison.