binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
345 stars 85 forks source link

Pretrained embedders #41

Open yangsenwxy opened 2 years ago

yangsenwxy commented 2 years ago

I have a question, your simlr is pre-training, does it include all the data of camelyon16 (training set and test set)? Because I found that your feature extractor is faulty, you leaked the information of the test set, I tried, only pre-trained on the training set, there is no such high result, I think you should check this problem carefully, resulting in your result is too high

binli123 commented 2 years ago

https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi There are several model weights trained using only the training data. I also tested using both the training set and the testing set for SimCLR, the difference in the results is minor. What is the batch size you used? Please make sure the batch size is at least 512 and train for enough iterations in order to get an actual useful embedder from SimCLR, as pointed out in their paper. Bigger batch size and longer training time lead to better embedder and they have quite a big impact on the performance of the downstream task. The best embedder we obtained was trained for 2 months because of the large number of patches.

Plus, we are not the only ones who had luck with self-supervised learning on Camelyon16, https://arxiv.org/pdf/2012.03583.pdf where they showed that very high results can be obtained.

yangsenwxy commented 2 years ago

Thank you very much, I found that the features you extracted are only 0.86 if you train directly with the CLAM method.

raycaohmu commented 1 year ago

Hi, are those weights trained using tcga data?

GeorgeBatch commented 1 year ago

@raycaohmu

Camelyon16 weights: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi

TCGA-lung weights: https://drive.google.com/drive/folders/1Rn_VpgM82VEfnjiVjDbObbBFHvs0V1OE

PiumiDS commented 1 year ago

Camelyon16 weights: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi

  • see folder names for magnifications

TCGA-lung weights: https://drive.google.com/drive/folders/1Rn_VpgM82VEfnjiVjDbObbBFHvs0V1OE

  • magnification: low=2.5x, high=10x
  • pre-taining: v0 for 3 days, v1 for 2 weeks (better results)

Hi @GeorgeBatch,

I have seen the previous discussion on the magnification change for TCGA-lung patches. Could I please verify that when the above pre-trained model is specified as,

  • magnification: low=2.5x, high=10x

this is only for 20x patches of the whole dataset? (so the pre-trained model is trained on 20x,5x (for 40x images) and 10x,2.5x (for 20x images))

Many thanks in advance. Piumi.

GeorgeBatch commented 1 year ago

Hi @PiumiDS,

this is only for 20x patches of the whole dataset? (so the pre-trained model is trained on 20x,5x (for 40x images) and 10x,2.5x (for 20x images))

I am afraid I do not know the answer to your question myself. So here we will both need to wait for @binli123's answer

George