Ma-Xinzhi / LightConvNet

MIT License
20 stars 4 forks source link

Training after Early Stopping #3

Open ExploringCodes opened 3 months ago

ExploringCodes commented 3 months ago

Is training after early stopping necessary to replicate the result mentioned in the paper ? Also, from the code it seems the model continue to train up to 200 epochs after early stopping and do not seem to track the validation dataset as it combine train dataset and validation dataset during the training phase after early stopping. Can you please clarify ?

Ma-Xinzhi commented 3 months ago

Training after early stopping is necessary for the result reported in the paper. The original training dataset is split into the training dataset and the validation dataset for early stopping. The validation dataset is used to monitor the model performance during the first stage of training. In the second stage of training, the training dataset and the validation dataset are combined into the original training dataset. And we continue to train the model on the original training dataset for a maximum of 200 epochs. In the second stage of training, the previous validation dataset is used to train the model. We find that the model performance on the validation dataset is close to 100%. Therefore, we don't use the validation dataset to monitor the model performance in the second stage of training. During the training process of our model, the early stopping strategy can be treated as a way to find proper initialization parameters of our model, so that the model is able to converge better on the original training dataset.

ExploringCodes commented 2 months ago

or have I missed any standardization steps ? I think no standardization was done here. After epoching the data was fed to filterbanks to make the data from [288,22,1000] to [288,9,22,1000]

ExploringCodes commented 2 months ago

Training after early stopping is necessary for the result reported in the paper. The original training dataset is split into the training dataset and the validation dataset for early stopping. The validation dataset is used to monitor the model performance during the first stage of training. In the second stage of training, the training dataset and the validation dataset are combined into the original training dataset. And we continue to train the model on the original training dataset for a maximum of 200 epochs. In the second stage of training, the previous validation dataset is used to train the model. We find that the model performance on the validation dataset is close to 100%. Therefore, we don't use the validation dataset to monitor the model performance in the second stage of training. During the training process of our model, the early stopping strategy can be treated as a way to find proper initialization parameters of our model, so that the model is able to converge better on the original training dataset.

Thanks for your reply. I have tried to replicate the code and I have trained the model accordingly after early stopping as the paper says, and followed the training procedure given in the codes. Though I have organized the train_step, valid_step, etc., in a procedural manner of coding. Aside from that I have followed the hyperparameter given in the .yaml files. However for session independent setting the accuracy that I have attained is 73.71% which is lower than average accuracy in the paper which is 79.48 %. Average accuracies for 9 subjects ( averaged over ten-fold cross-validated accuracies ) is [ 82.88% , 52.60%, 87.84%, 71.87%, 63.68%, 54.3055%, 86.97%, 79.79%,83.50%]. The average then becomes 73.71 %. I have not used any seeds though but I do not think that is the issue here. I have used GPU P100 on kaggle for training. Could you please suggest what I could be missing here ?

Ma-Xinzhi commented 2 months ago

I can not figure out where the problem is. Maybe, you can try with different seeds and see what happens. The standarization that you mentioned is not necessary for data processing and probably contributes to slightly worse results, since the model use the variance layer to extract features.