Closed Jerry-Master closed 8 months ago
I incorporated the training/testing into the same pipeline in the latest commit. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:
A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation. For each fold, the best model and corresponding threshold are saved. After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models. For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.
You can also simply run a 5-fold cv --eval_scheme=5-fold-cv
There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, setting strict=False can reveal the problems.). The purpose of the testing script is to generate the heatmap, you should now read the performance directly from the training script. I will fix the issues in a couple of days.
Should be fixed in the latest commit. Uploaded compatible aggregator weights. Hardcoded and still needs some more work.
When inspecting the state dict of milnet in
testing_tcga.py
it has keysb_classifier.q.0.bias
andb_classifier.q.2.bias
but the aggregator.pth has keysb_classifier.q.bias
. It suggests that you trained the aggregator withnonlinear=False
but then provided the test code fornonlinear=True
. Same withpassing_v
, the weights seems to havepassing_v=True
but the code haspassing_v=False
. Since you are usingstrict=False
you won't notice that but it could be making you report misleading results by using default weights for the aggregator. I would suggest you review that.