Closed pzSuen closed 2 years ago
Please check issue: #13 #20
Hello, @binli123
now I have another question,
You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag."
But after I analyzed the features you gave in download.py
, the average number of patches is about 12247, which is much bigger than 8000.
Can you answer my questions?
I'm looking forward to hearing from you!
Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in
download.py
, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!
I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.
Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in
download.py
, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.
Thanks for your reply.
I have run serval methods in the c16
data you gave in download.py
. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 .
Can you provide detailed information about the data you gave?
Thank you for your reply again!
Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in
download.py
, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.
Thanks for your reply. I have run serval methods in the
c16
data you gave indownload.py
. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!
How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?
Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in
download.py
, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.
Thanks for your reply. I have run serval methods in the
c16
data you gave indownload.py
. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?
I randomly split the official train data(270) into 9:1 as train and valid set, and the official test data(129) as the test set. I upload the split file to Google Drive .
I still have two questions:
First, the patches number. Your patch number is similar to TransMIL (8800), but what I got by CLAM is around 15000, which is similar to the data you provide by download.py
but much bigger than 8000 (you said in the paper). I have checked my cropped patches, most low-content areas have been filtered out. The gap can't be so big.
Second, the data c16
you provided in download.py
. How they were produced makes their super results?
Thank you for your patient reply again!
Hello, @binli123 now I have another question, You said, " The dataset consists of 271 training images and 129 testing images, which yield roughly 3.2 million patches at 20× magnification and 0.25 million patches at 5× magnification with on average about 8,000 and 625 patches per bag." But after I analyzed the features you gave in
download.py
, the average number of patches is about 12247, which is much bigger than 8000. Can you answer my questions? I'm looking forward to hearing from you!I think in the paper, the patches were filtered with a bigger threshold so more low-content patches were excluded.
Thanks for your reply. I have run serval methods in the
c16
data you gave indownload.py
. The result is very high, even Max-pooling can achieve 0.975 AUC. And I used 270/129 split with 5-fold cross-validation as #29 . Can you provide detailed information about the data you gave? Thank you for your reply again!How did you use 270/129 (0.68 : 0.32) split while having 5-fold (0.8 : 0.2) cross-validation? Could you elaborate more?
I randomly split the official train data(270) into 9:1 as train and valid set, and the official test data(129) as the test set. I upload the split file to Google Drive .
I still have two questions: First, the patches number. Your patch number is similar to TransMIL (8800), but what I got by CLAM is around 15000, which is similar to the data you provide by
download.py
but much bigger than 8000 (you said in the paper). I have checked my cropped patches, most low-content areas have been filtered out. The gap can't be so big. Second, the datac16
you provided indownload.py
. How they were produced makes their super results?Thank you for your patient reply again!
Hi, if you don't mind you can try to use one of these embedder weights to recompute the features: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi?usp=sharing
For detection of metastasis in lymph nodes, it wouldn't hurt to filter more aggressively because you will find out that a higher threshold mainly removes connective tissue debris and small peripheral blood vessels (those are mainly stained by lighter color eosin) and you know that these components do not include lymph nodes which are groups of dense cells stained by hematoxylin (darker purple color).
Ok, thank you for your quick and patient reply. I am trying what you said.
I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?
I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?
Yes, I only realized this afterward. It is just better to match micron/pixel instead of the magnifications because the scanners have different standards.
Ok, thank you for your patient reply!
I visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different. Is pixel size more important than magnification? Is it better to extract patches at similar pixel sizes rather than the same magnification?
Hi, Could you please tell me how to find the magnification of C16 WSIs? I use the openslide library to open WSIs, and I did not find the magnification information in the metedata.
Hello, You said, "3.2 million patches at 20× magnification" you get in the paper. But I found some of it is at magnification 20x and the other is 40x(as in fig 1) after I checked the Camelyon16 data.
I try to split data from RUMC at level 0 and UMCU at level 1 but only get about 10 million patches.
So my questions are two folds: