Open acxz opened 4 years ago
Normalization: Use LayerNorms (there is also RMSENorm) before each layer (what about instance/group norms?)
maybe: before computing the loss pass the output and target through a normalization (maybe another layernorm), to ensure normalized gradients? Only have this part in train/val/test but not in predict.
Activation functions: mish better than silu? Also should it be input -> [layernorm -> actfunc -> linear layer] -> ... -> last linear layer (i.e output) (-> opt for train/val/test: layernorm -> loss)
Transformers might be a good alternative for time series compared to LSTM. Mamba as well (not available in pytorch yet: https://github.com/pytorch/pytorch/issues/120189) Other ideas: lstm with encoders/decoders (seq2seq); lstm with attention head
check out: https://github.com/timeseriesAI/tsai for datasets and models used for time series with fastai there is also pytorch-forecasting (uses pytorch lightning) and sktime as well.