binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
378 stars 88 forks source link

Wrong testing code or weights #86

Closed Jerry-Master closed 8 months ago

Jerry-Master commented 11 months ago

When inspecting the state dict of milnet in testing_tcga.py it has keys b_classifier.q.0.bias and b_classifier.q.2.bias but the aggregator.pth has keys b_classifier.q.bias. It suggests that you trained the aggregator with nonlinear=False but then provided the test code for nonlinear=True. Same with passing_v, the weights seems to have passing_v=True but the code has passing_v=False. Since you are using strict=Falseyou won't notice that but it could be making you report misleading results by using default weights for the aggregator. I would suggest you review that.

binli123 commented 8 months ago

I incorporated the training/testing into the same pipeline in the latest commit. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:

A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation. For each fold, the best model and corresponding threshold are saved. After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models. For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.

You can also simply run a 5-fold cv --eval_scheme=5-fold-cv

There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, setting strict=False can reveal the problems.). The purpose of the testing script is to generate the heatmap, you should now read the performance directly from the training script. I will fix the issues in a couple of days.

binli123 commented 8 months ago

Should be fixed in the latest commit. Uploaded compatible aggregator weights. Hardcoded and still needs some more work.