Open Henry0394 opened 3 months ago
It would be helpful to first reproduce the results of classic ABMIL
to check your environment and training scripts.
It would be helpful to first reproduce the results of classic
ABMIL
to check your environment and training scripts.
Thanks, and I did, but the result for ABMIL is still under 80%, it is really weird.
That may require double-check the dataset, the experimental environment, and the training codes.
Thanks, but I'm not the only one unable to reproduce the results. As far as I know, several others I'm familiar with have also encountered similar issues. Are there any tricks that aren't mentioned in the code, or could there be bugs within the codebase?
Although I would like to share some secret tips with you, the Docker and code I provided are exactly what I used. The code in this repository has been refactored, so it may not reproduce the results of the paper with 100% accuracy, and there might be a deviation of ±1-2%. However, if the deviation exceeds 10%, or if AB-MIL fails to produce valid results, I would find it very strange. If you are using the data, Docker, and code I provided and still see a deviation of over 10%, could you please share the training logs? I will do my best to help resolve the issue.
Hello, I've encountered the same issue. Three months ago, I reproduced this article, and at that time, the evaluation metrics on ABmil were roughly the same as those in the paper. However, when I run the program again now, the metrics for ABmil-MHIM on the C16 dataset are also unable to reach an 80% accuracy rate. During this period, I haven't changed any code. Could you please help me understand what might have caused this issue?
This issue is also the same when running ABmil-MHIM on the TCGA dataset.
Hello, I've encountered the same issue. Three months ago, I reproduced this article, and at that time, the evaluation metrics on ABmil were roughly the same as those in the paper. However, when I run the program again now, the metrics for ABmil-MHIM on the C16 dataset are also unable to reach an 80% accuracy rate. During this period, I haven't changed any code. Could you please help me understand what might have caused this issue?
If you are using the data, Docker, and code I provided and still see a deviation of over 10%, could you please share the training command, logs and initialization weights for the teacher model? I will do my best to help resolve the issue.
This issue is also the same when running ABmil-MHIM on the TCGA dataset.
Is the AUC of ABMIL on the TCGA also under the 80%?
是的,在TCGA数据集上的表现也很差,这个图上的结果是今天我运行代码后得到的结果。 下面这张是我三个月前在TCGA数据上运行ABmil-MHIM得到的结果 在teacher模型上我选取的还是之前三个月前训练好得到的的baseline模型——fold_2_model_best_auc.pt 关于ABmil-MHIM具体我的执行命令是 python3 main.py \ --project=tcga_lung_mhim2\ --dataset_root=./datasets/tcga \ --model_path=./output \ --cv_fold=4 \ --val_ratio=0.13 \ --teacher_init=./output/tcga_lung_baseline/abmil/fold_2_model_best_auc.pt \ --title="abmil_105_mr70l20h2-0_mmcos_is" \ --baseline=attn \ --num_workers=0 \ --cl_alpha=0.5 \ --mask_ratio_h=0.02 \ --mrh_sche \ --mm_sche \ --init_stu_type=fc \ --mask_ratio=0.7 \ --mask_ratio_l=0.2 \ --seed=2021 \ --num_workers=0 \ --datasets=tcga
是的,在TCGA数据集上的表现也很差,这个图上的结果是今天我运行代码后得到的结果。 下面这张是我三个月前在TCGA数据上运行ABmil-MHIM得到的结果 在teacher模型上我选取的还是之前三个月前训练好得到的的baseline模型——fold_2_model_best_auc.pt 关于ABmil-MHIM具体我的执行命令是 python3 main.py --project=tcga_lung_mhim2 --dataset_root=./datasets/tcga --model_path=./output --cv_fold=4 --val_ratio=0.13 --teacher_init=./output/tcga_lung_baseline/abmil/fold_2_model_best_auc.pt --title="abmil_105_mr70l20h2-0_mmcos_is" --baseline=attn --num_workers=0 --cl_alpha=0.5 --mask_ratio_h=0.02 --mrh_sche --mm_sche --init_stu_type=fc --mask_ratio=0.7 --mask_ratio_l=0.2 --seed=2021 --num_workers=0 --datasets=tcga
teacher_init
的情况下再进行测试,而且因为是多折交叉验证,不能直接使用其中某一折(fold2
)来初始化所有折的模型,这样会导致测试集泄露的问题
I'm attempting to reproduce the results on the Camelyon16 dataset, but I'm obtaining an accuracy rate of around 78%-80%, which is approximately 10 percent lower than the reported accuracy in the paper. Could there be some critical details I've overlooked? Or could there potentially be a bug in the code?