EIDOSLAB / torchstain

Stain normalization tools for histological analysis and computational pathology
MIT License
111 stars 20 forks source link

Whole Slide Images #42

Closed nabilapuspit closed 1 year ago

nabilapuspit commented 1 year ago

hi, I'm a new person in this field and I need this method to normalize the data I have because the data variation is too big. I work with WSI data but have limitations in device. I've tried several packages regarding stain normalization and all of them were killed due to memory limitations. the method works well on patch-based data but unfortunately I need it as a whole slide. any input will mean a lot to me. Thank You

andreped commented 1 year ago

Hello, @nabilapuspit !

Normalizing the entire WSI is rarely practical or actually what you want to do as a lot of the extracted patches are not suitable for the downstream analysis.

I assume you wish to train some patch-wise classifier or similar? Are you using tensorflow or pytorch? Are you using a specific framework or setting up most yourself? Also, which format is the WSIs you are using stored in?

nabilapuspit commented 1 year ago

Hi, I really appreciate your fast response, Im currently doing MIL-based classification using pytorch. I am using CLAM framework and utilizing .svs data. I tried to do the pre processing in the patch extraction step but got the memory error because as I said, I tried normalized the entire WSI. do you have any suggestion for me?

andreped commented 1 year ago

I am using CLAM framework

CLAM is great! Have not started using it yet but heard a lot of great things.

After going through the source code, I was surprised to see that CLAM did not do any stain normalization nor data augmentation. The only notion of any patch preprocessing, I could see here. I might be wrong though. Perhaps you have more experience with CLAM than I do.

Regardless, if thats the case, then yeah, I would perform stain normalization on the generated patches using the create_patches_fp.py script documented here.

The simplest way is to just iterate over every single patch in the resulting patch dataset folder, apply stain normalization, and then either overwrite the original patches or create a new patch directory tree (identical to the original) with the updated patches.

I tried to do the pre processing in the patch extraction step but got the memory error because as I said

I guess here you are talking about running the create_patches_fp.py script? If so, you have not yet ran stain normalization, so torchstain is not the issue. Or are you trying to run stain normalization on the generated patches?

nabilapuspit commented 1 year ago

yes CLAM framework is great even though they don't implement any data augmentation. that's why I'm intend to applied the torchstain for the preprocessing.

I guess here you are talking about running the create_patches_fp.py script?

yes I'm talking about create_patches_fp.py, but as far as I understand, the process in that step is only for taking the contour coords for patch and not the patch itself. so they tried to extract the foreground then get the patch coords saved to .h5 files. in the extract_features_fp.py script, they open the wsi again using openslide (you can see it here )then access the .h5 files created in the previous steps. I might be misunderstand though, since I relatively new to all of this kind of stuff.

If you seehere, previously I tried to apply torchstain after that lines . it will applied to the whole slide right? that's why my computer killed the process

The only notion of any patch preprocessing, I could see here.

as short as I understand, that parts is only for heatmaps visualization purpose and has no direct relationship with the overall AI process

andreped commented 1 year ago

yes I'm talking about create_patches_fp.py, but as far as I understand, the process in that step is only for taking the contour coords for patch and not the patch itself.

Then I believe the best alternative is to perform stain normalization in the Dataset generator. The easiest is likely to add it as a step here. Just remember to construct the normalizer class and apply fit() somewhere else, perhaps in the __init__() of the class, as this only need to be done once. Then you can apply the normalizer on individual streamed patches.

If you seehere, previously I tried to apply torchstain after that lines . it will applied to the whole slide right? that's why my computer killed the process

That method only performs tissue segmentation. I believe the main issue here is to use stain normalization as preprocessing step for the classifier no?

Note that you should perform stain normalization as preprocessing step both for training and inference of the WSI classifier. This can be done by normalizing the patches on the fly as they are being read from the WSI.

For inference, you likely need to apply the normalization here. Just remember to convert the PIL Image to the appropriate format and then back again, if necessary. You can use numpy backend, if GPU memory constraints are an issue.

nabilapuspit commented 1 year ago

thank you for your advice. although I'm still trying to understand your advice, I'll try to do it slowly. I really appreciate your kindness to want to discuss with me

andreped commented 1 year ago

thank you for your advice. although I'm still trying to understand your advice, I'll try to do it slowly. I really appreciate your kindness to want to discuss with me

No rush! Take your time. You learn the most by trial and error.

If you are still having issues tomorrow, please, let me know. If of interest, we could also arrange a quick zoom meeting and you could show me what you have tried to do, and I could make some suggestions. The easiest is likely to contact me on LinkedIn regarding this. You could also consider making an issue in the CLAM repository instead.

As I believe it is not an issue with torchstain, but rather integration with CLAM, I am closing this issue. Feel free to post a reopen this issue or make a new one if you experience any other issues using this tool.