Open JackLiu2025 opened 2 hours ago
I know part of the reason is that the validation set is too short, which may not be representative enough.
I have a question regarding the setting of the number of epochs. Why did you choose 75 epochs in the model? Is this an empirical value? Additionally, the MASTER paper also used a similar mechanism to control train loss, and it seems that models using early stopping strategies tend to perform relatively weaker. How should one determine the number of training epochs in practice? The performance improvement of the model made me suspect the use of future data at one point, LOL :) Congratulations to the authors for discovering such an effective method.
Have you reproduced the results in the paper?
I have a question regarding the setting of the number of epochs. Why did you choose 75 epochs in the model? Is this an empirical value? Additionally, the MASTER paper also used a similar mechanism to control train loss, and it seems that models using early stopping strategies tend to perform relatively weaker. How should one determine the number of training epochs in practice? The performance improvement of the model made me suspect the use of future data at one point, LOL :) Congratulations to the authors for discovering such an effective method.