binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
373 stars 88 forks source link

Quesions about Camelyon16 #32

Closed pzSuen closed 2 years ago

pzSuen commented 2 years ago

Hello, You said, "3.2 million patches at 20× magnification" you get in the paper. But I found some of it is at magnification 20x and the other is 40x(as in fig 1) after I checked the Camelyon16 data.

image

I try to split data from RUMC at level 0 and UMCU at level 1 but only get about 10 million patches.

So my questions are two folds:

  1. How do you deal with the different magnification problems?
  2. What is your background processing method?
binli123 commented 2 years ago

Please check issue: #13 #20

pzSuen commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

binli123 commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.

pzSuen commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.

Thanks for your reply. I have run serval methods in the c16data you gave in download.py. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!

binli123 commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.

Thanks for your reply. I have run serval methods in the c16data you gave in download.py. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!

How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?

pzSuen commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.

Thanks for your reply. I have run serval methods in the c16data you gave in download.py. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!

How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?

I randomly split the official train data(270) into 9:1 as train and valid set, and the official test data(129) as the test set. I upload the split file to Google Drive .

I still have two questions: First, the patches number. Your patch number is similar to TransMIL (8800), but what I got by CLAM is around 15000, which is similar to the data you provide by download.py but much bigger than 8000 (you said in the paper). I have checked my cropped patches, most low-content areas have been filtered out. The gap can't be so big. Second, the data c16 you provided in download.py. How they were produced makes their super results?

Thank you for your patient reply again!

binli123 commented 2 years ago

Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in download.py, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!

I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.

Thanks for your reply. I have run serval methods in the c16data you gave in download.py. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!

How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?

I randomly split the official train data(270) into 9:1 as train and valid set, and the official test data(129) as the test set. I upload the split file to Google Drive .

I still have two questions: First, the patches number. Your patch number is similar to TransMIL (8800), but what I got by CLAM is around 15000, which is similar to the data you provide by download.py but much bigger than 8000 (you said in the paper). I have checked my cropped patches, most low-content areas have been filtered out. The gap can't be so big. Second, the data c16 you provided in download.py. How they were produced makes their super results?

Thank you for your patient reply again!

Hi, if you don't mind you can try to use one of these embedder weights to recompute the features: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi?usp=sharing

For detection of metastasis in lymph nodes, it wouldn't hurt to filter more aggressively because you will find out that a higher threshold mainly removes connective tissue debris and small peripheral blood vessels (those are mainly stained by lighter color eosin) and you know that these components do not include lymph nodes which are groups of dense cells stained by hematoxylin (darker purple color).

pzSuen commented 2 years ago

Ok, thank you for your quick and patient reply. I am trying what you said.

pzSuen commented 2 years ago

I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?

image

binli123 commented 2 years ago

I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?

image

Yes, I only realized this afterward. It is just better to match micron/pixel instead of the magnifications because the scanners have different standards.

pzSuen commented 2 years ago

Ok, thank you for your patient reply!

HHHedo commented 1 year ago

I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?

image

Hi, Could you please tell me how to find the magnification of C16 WSIs? I use the openslide library to open WSIs, and I did not find the magnification information in the metedata.