binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
358 stars 88 forks source link

Encountering problems while training models on the Camelyon16 dataset #93

Open Ysc-shark opened 6 months ago

Ysc-shark commented 6 months ago

Hi,thanks for sharing your great work and it really helps me a lot.

I am trying to replicate the results of DSMIL and make some modifications based on it. However, I encountered some problems when training on the Camelyon16 dataset. I would greatly appreciate any guidance and advice, as my experience with training deep learning models and analyzing pathological images is limited.

Ysc-shark commented 6 months ago

4k_TKAMIL_Camelyon_ss_Simclr_fold4 4k_TKAMIL_Camelyon_ss_Simclr_fold0 Sometimes the traning process seems reasonable

binli123 commented 6 months ago

I incorporated the training/testing into the same pipeline in the latest commit. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:

A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation. For each fold, the best model and corresponding threshold are saved. After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models. For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.

You can also simply run a 5-fold cv --eval_scheme=5-fold-cv

There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, set strict=False can reveal the problems.) I will fix this in a couple of days.