Open Bontempogianpaolo1 opened 2 years ago
Could you check out the CSV files containing the features and labels?
the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:
camelyon.csv
normal143.csv
However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:
Slide name | th =19 | th=25 | your features |
---|---|---|---|
tumor_108 | 29905 | 402 | 23263 |
test_124 | 6693 | 3001 | 2402 |
tumor_095 | 39960 | 1002 | 31791 |
normal_137 | 33396 | 505 | 23443 |
tumor_076 | 61670 | 42057 | 19708 |
Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20
With this configuration the mil training remains under the 0.7 % AUC Thanks in advance for your reply
the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:
camelyon.csv normal143.csv
However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:
Slide name th =19 th=25 your features tumor_108 29905 402 23263 test_124 6693 3001 2402 tumor_095 39960 1002 31791 normal_137 33396 505 23443 tumor_076 61670 42057 19708 Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20
With this configuration the mil training remains under the 0.7 % AUC Thanks in advance for your reply
The feature values look strange. There are some abnormal values > 10. Did you use BatchnNorm or InstanceNorm consistently in the training and feature computation?
I took directly your embedder without training and I passed it to the compute_feats script with InstanceNorm2d since it is the default parameter
model_v2.pth
Have you tried model_v0.pth and model_v1.pth, did they also not work?
not yet... I considered the v2 model as the best one
screenshot features using model-v0
screenshot features using model-v1
screenshot features using model-v0
screenshot features using model-v1
Those are very different from mine. There should not be values>10, they are all around the same scale. If you are using a newer GPU card please make sure cuda>=11.0, not 10.2
sorry.. excel made some errors during the visualization... the real screenshots are these:
model-v0
model-v1
So all the numbers seems under the same scale....
Does your normal_141_42_54.jpg
look like this?
My feature csv using v2
normal_141.csv
I don't have it... What are the parameters you used for the script deepzoom_tiler.py in the case of Camelyon?
This is my normal_141 48_112.jpeg
this is my tumor_047 101_546.jpeg
Mine seems with an higher magnification maybe?
It turns out that Camleyon16 consists of mixed magnifications, so by experimenting the correct configuration: python deepzoom_tiler.py -m 1 -b 20 -d Camelyon16-pilot -v tif
In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it
In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it
I think it is still 20x because the base magnification has ~0.25 micro/pixel which corresponds to 40x for the Aperio scanner (FDA standard). A 20x magnification corresponds to ~0.5 micron/pixel. Camelyon16 uses a mixture of magnifications with different micron/pixel.
Notice how their 20x and 40x scanners have almost the same micron/pixel? You will call the "20x" RUMC a "40x" image for UMCU. So better just use the FDA standard.
Ok! I'm just trying it and inside the folder "temp" the patches are stored inside a "10" folder ( imagining it refers to the magnitude). Anyway, thank you very much for your replies! I'll just try the entire pipeline again with these new patches and I'll tell you the results as soon as possible
It worked !! But I still have problems :(... I'm opening a new issue for that since it is not relative to the dataset but to the embedder
Hi @binli123 ,
I'm trying to replicate your results without success on camelyon16. I put the number of classes to 1 and also tried weights online for computing the feats on both training and test set. Even with that I still obtain only 0.7% AUC... So I start thinking about how I organized the data different from you. I downloaded the data from here: https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001_101000/100439/CAMELYON16/ the data is divided into training and test. I used as threeshold 25 for filtering out background. So I used only the training set for training the self-supervised model. After that, even with the model you published on drive, I extracted feats with the compute_feat script for both training and test(especially with the fusion option). Finally, I modified the train_tcga for considering them as sources for the training set and the test set (270 /130 bags). Even
If instead, I use the features precomputed by you the mil model works. So the problem could be how I split data or how I extract embeddings. What am I missing?