StandardScaler() works better than MinMaxScaler(),
MinMaxScaler() (range [0,1]) was tested also with Sigmoid() on the output, but something was not working correctly (only 0 predictions)
No surprise, the longer the sequence for the LSTM, the better results,
Sometimes, for different channels, the predictions on the test set are very similar (look at the picture, channel 7 and 8)
The picture is from the test set, the model trained with sequence = 16.
I tested both options: predict the following value, and in the second configuration, predict the last value of the provided sequence. Both configurations produce very similar results, as presented above.
Each model in my configuration was trained for 100 epochs.
The first experiment shows that the model can overfit well on the training set, however, at the same time is lost in the testing data. Training Testing