Unstable learning - Githubissues

jafarinia commented 1 year ago

Hi Thank you for your great work. I have a problem with training of your model. This is the curve for score and test (validation) loss for some different hyper parameters on TCGA data: W B Chart 2_25_2023, 4_23_51 PM W B Chart 2_25_2023, 4_24_46 PM 21.pth -> lr = 2e-4, wd = 5e-3 1.pth -> lr = 2e-4, wd = 5e-4 2.pth -> lr = 2e-4, wd = 5e-4 (1 and 2 have same hyper parameters but just different random initializations) 5.pth -> lr = 2e-4, wd = 1e-4 18.pth -> lr = 2e-4, wd = 1e-7 19.pth -> lr = 2e-5, wd = 5e-7 Based on what I see the training seems unstable because auc scores are getting worse (or at least seem to have relatively unstable behavior) during time but test loss which is validation loss is getting less meaning that overfitting is not the case. Can you please explain what's the reason and what's happening here? And also why did you set 200 epochs for learning while I haven't seen even 1 one model in 21 different hyper parameters to get updated after epoch 7? Another very strange thing is that 19.pth has the best auc the attentions weights of it is very bad (all of them are 0) which is very odd. I think this is still a continuation of https://github.com/binli123/dsmil-wsi/issues/61#issue-1434606738. Thank you

Rainydu184 commented 1 year ago

I meet a same problem. I try to use the code in a 4 classification problems. I get train and test loss , AUC of every classes. However, in my test data, they are all identified as one class(POLE). Have you solved the problem? AUC loss

binli123 commented 6 months ago

I incorporated the training/testing into the same pipeline in the latest commit. I also incorporated an orthogonal weights initialization which helps making the training more table. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:

A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation. For each fold, the best model and corresponding threshold are saved. After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models. For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.

You can also simply run a 5-fold cv --eval_scheme=5-fold-cv

There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, setting strict=False can reveal the problems.). The purpose of the testing script is to generate the heatmap, you should now read the performance directly from the training script. I will fix the issues in a couple of days.

binli123 / dsmil-wsi

Unstable learning #70