Closed eator closed 5 days ago
Because the model is overfit. In other words, you may have trained for too many epochs.
In the figure, we can notice that, despite the fluctuation, the validation loss decreases first (0-10 epoch) and then increases.
We set train_stop_loss_thred=0.95
in the main.py: line 32
for early stopping, which is roughly when the validation loss stops decreasing in our repeated experiments. I think you may have altered this setting to explore. Indeed, due to its unique data properties, the model training and selection can be difficult in the stock price forecasting task.
Because the model is overfit. In other words, you may have trained for too many epochs.
In the figure, we can notice that, despite the fluctuation, the validation loss decreases first (0-10 epoch) and then increases.
We set
train_stop_loss_thred=0.95
in themain.py: line 32
for early stopping, which is roughly when the validation loss stops decreasing in our repeated experiments. I think you may have altered this setting to explore. Indeed, due to its unique data properties, the model training and selection can be difficult in the stock price forecasting task.
I got it, thanks. On the other hand, I wonder that if we use Masked self-attention(which is not used in your implementation, the Model could perform better?)
I suppose the suggestion is to apply masked self-attention for the intra-stock aggregation.
We didn’t experiment with this structure. It seems masked self-attention does not fulfill the intention of MASTER. In MASTER, the intra-stock & inter-stock aggregation together approximate the asymmetric stock correlation, namely both (u, t_1)
to(v, t_2)
and (u, t_2)
to (v, t_1)
are computed. In the visualization section, the asymmetric patterns are validated.
However, experiments will be of value, where insights and improvement may be found. 😄
As the figure shows, the valid loss is increasing.