Same attention score and the pre-trained aggregators.

HHHedo commented 1 year ago

Dear bin, Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike #54 with only 60%. But I did find each patch will produce the same attention score as #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?
When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

HHHedo commented 1 year ago

Hi bin, I solve the problem of same attention score by removing the dimension normalization, and the performance is comparable. However, I am still confused about the pre-trained models and pre-computed features.

binli123 commented 1 year ago

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

HHHedo commented 1 year ago

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

HHHedo commented 1 year ago

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it. Thank you very much!

xiaozhu0816 commented 1 year ago

Dear bin, Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?

When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 . I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction. 1) Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga 2) Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.

... and after the 3rd epoch, there is no other better model to be solved. That's very confused me.

Could you tell me why and how to fix it? Thank you very much.

binli123 commented 1 year ago

Dear bin, Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?

When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 . I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction.

Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga

Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.

... and after the 3rd epoch, there is no other better model to be solved. That's very confused me.

Could you tell me why and how to fix it? Thank you very much.

My experience is that the modal sometimes converges very fast on the TCGA dataset. I also found initialization matters.

binli123 commented 1 year ago

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it. Thank you very much!

The settings are the default values. I discovered that sometimes it does not converge fast, and sometimes it does. This is especially the case when the positive samples are few in a positive bag. But with some standard weight initialization methods proposed for faster convergence, you could possibly get a faster-converging rate.

binli123 / dsmil-wsi

Same attention score and the pre-trained aggregators. #59