binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
378 stars 88 forks source link

SimClr -CAMELYON16 #10

Closed Amcky closed 1 year ago

Amcky commented 3 years ago

Could you please post the weight of SIMCLR training on CAMELYON16 dataset

binli123 commented 3 years ago

https://drive.google.com/drive/folders/1jd5qbpZ0fdqJdH3FYQaTimxYHkzNa_bW?usp=sharing ResNet18 with instance normalization for the 20x patches (at level 1, level 0 is 40x). I will also upload precomputed features for Camelyon16 soon.

windstormer commented 3 years ago

With the provided pretrain weight of SIMCLR, I had tried to compute the embeddings and apply to train_tcga.py (with dataset adaption). However, the AUC results is about 0.66 which is far from the one presented in the paper. Is there any setting modification for MIL parts for CAMELYON16 from TCGA dataset? image

binli123 commented 3 years ago

With the provided pretrain weight of SIMCLR, I had tried to compute the embeddings and apply to train_tcga.py (with dataset adaption). However, the AUC results is about 0.66 which is far from the one presented in the paper. Is there any setting modification for MIL parts for CAMELYON16 from TCGA dataset? image

Do you use the latest train_tcga.py? Your output doesn't seem right to me. You will need to set the --num_classes=1. I have updated the readme file.

Regarding the labels and how the data folders should be organized for binary and multiple class cases: For binary classifier, the negative class folder should have [CATEGORY_NAME] at index 0 when sorted alphabetically.

For binary classifier, use 1 for positive bags and 0 for negative bags. Use --num_classes=1 at training.
For multi-class classifier (N positive classes and one optional negative class), use 0~(N-1) for positive classes. If you have a negative class (not belonging to any one of the positive classes), use N for its label. Use --num_classes=N (N equals the number of positive classes) at training.

There seems to be another bug too, I am looking into it.

Please use the pretrained embedder here:https://uwmadison.box.com/shared/static/qs717clgaux5hx2mf5qnwmlsoz2elci2.zip and update compute_feats.py.

windstormer commented 3 years ago

Currently, I found that your code will assign the label 1 for "normal" and 0 "tumor" in camelyon16 dataset, which is because the sorting algorithm did not follow alphabetic order. You could replace "sorted(num_classes)" in line 231 of compute_feats.py to "num_classes.sort()". Short suggestion.

binli123 commented 3 years ago

Currently, I found that your code will assign the label 1 for "normal" and 0 "tumor" in camelyon16 dataset, which is because the sorting algorithm did not follow alphabetic order. You could replace "sorted(num_classes)" in line 231 of compute_feats.py to "num_classes.sort()". Short suggestion.

https://discuss.codecademy.com/t/what-is-the-difference-between-sort-and-sorted/349679

I need to assign the returned value.

linzhenyuyuchen commented 3 years ago

With the provided pretrain weight of SIMCLR, I had tried to compute the embeddings and apply to train_tcga.py (with dataset adaption). However, the AUC results is about 0.66 which is far from the one presented in the paper. Is there any setting modification for MIL parts for CAMELYON16 from TCGA dataset? image

I also found that the AUC results is about 0.66. Is there any solution ? Thank you !

windstormer commented 3 years ago

With the provided pretrain weight of SIMCLR, I had tried to compute the embeddings and apply to train_tcga.py (with dataset adaption). However, the AUC results is about 0.66 which is far from the one presented in the paper. Is there any setting modification for MIL parts for CAMELYON16 from TCGA dataset? image

I also found that the AUC results is about 0.66. Is there any solution ? Thank you !

As the author @binli123 had mentioned, you'll have to check if the negative bags had the label as 0 and the positive ones as 1. By solving this problem, I can receive a over 0.9 AUC with provided weights.

binli123 commented 3 years ago

Trained models are updated to this link now: https://drive.google.com/drive/folders/14pSKk2rnPJiJsGK2CQJXctP7fhRJZiyn?usp=sharing. These models are trained with different settings and a shorter training time & smaller batch size will lead to representations that require longer converge time for the downstream MIL networks.