Why the validate loss is increasing while training epochs?

SJTU-Quant / MASTER

This is the official code and supplementary materials for our AAAI-2024 paper: MASTER: Market-Guided Stock Transformer for Stock Price Forecasting. MASTER is a stock transformer for stock price forecasting, which models the momentary and cross-time stock correlation and guide feature selection with market information.

104 stars 24 forks source link

Why the validate loss is increasing while training epochs? #10

Closed eator closed 5 days ago

eator commented 1 week ago

As the figure shows, the valid loss is increasing.

LITONG99 commented 1 week ago

Because the model is overfit. In other words, you may have trained for too many epochs.

In the figure, we can notice that, despite the fluctuation, the validation loss decreases first (0-10 epoch) and then increases.

We set train_stop_loss_thred=0.95 in the main.py: line 32 for early stopping, which is roughly when the validation loss stops decreasing in our repeated experiments. I think you may have altered this setting to explore. Indeed, due to its unique data properties, the model training and selection can be difficult in the stock price forecasting task.

eator commented 1 week ago

Because the model is overfit. In other words, you may have trained for too many epochs.

In the figure, we can notice that, despite the fluctuation, the validation loss decreases first (0-10 epoch) and then increases.

We set train_stop_loss_thred=0.95 in the main.py: line 32 for early stopping, which is roughly when the validation loss stops decreasing in our repeated experiments. I think you may have altered this setting to explore. Indeed, due to its unique data properties, the model training and selection can be difficult in the stock price forecasting task.

I got it, thanks. On the other hand, I wonder that if we use Masked self-attention(which is not used in your implementation, the Model could perform better?)

LITONG99 commented 1 week ago

I suppose the suggestion is to apply masked self-attention for the intra-stock aggregation. We didn’t experiment with this structure. It seems masked self-attention does not fulfill the intention of MASTER. In MASTER, the intra-stock & inter-stock aggregation together approximate the asymmetric stock correlation, namely both (u, t_1)to(v, t_2) and (u, t_2) to (v, t_1)are computed. In the visualization section, the asymmetric patterns are validated. However, experiments will be of value, where insights and improvement may be found. 😄