EIDOSLAB / torchstain

Stain normalization tools for histological analysis and computational pathology
MIT License
118 stars 20 forks source link

Feature requests #3

Open andreped opened 3 years ago

andreped commented 3 years ago

I had some ideas of feature requests which could greatly improve the usefulness of your tool.

  1. Enable GPU mode (seems to only run on CPU as of now)
  2. Enable batch mode (seems to assume that the input is a single image)
  3. Add augmentation option, similar as done in StainTools

I would be more than happy to help or assist in any way.

andreped commented 3 years ago

I believe I have added GPU support now. See my fork for my modifications.

I can make a new pull request after my current one has been merged, where we can attempt to address these ideas.

Regarding GPU compute: I did not see any real improvement. Actually for the image size I was testing, I found a degradation in runtime. I believe the benefit of using a GPU would be greater if there were more operations that could be run in parallel.

If a batch of images were used, each image could be processed in parallel using the GPU. This would also be very useful for training CNNs, where one could normalize/augment the patches on-the-fly.

I could also try to improve batch runtime using multiprocessing, but in this scenario the usage of GPU would be less relevant.

Surayuth commented 3 years ago

Is the code working correctly? I want to use it in my code.

andreped commented 3 years ago

@Surayuth : I have only added GPU support, as of now. If you wish to use that, simply install by:

pip install git+https://github.com/andreped/torchstain

But in the current state, there was no benefit of using the GPU. We need to modify the code to support batches first. However, I would like to add these modification to the main repository, but I am waiting for the owners/developer to respond.

I am currently working on another project, where I wish to build upon torchstain (or any fast and accurate color normalization approach) to make a simple CLI. For my CLI see here. Currently, you can perform normalization in parallel using multiprocessing. I see some improvement in runtime.

Simply install and use the CLI by:

pip install git+https://github.com/andreped/fast-stain-normalization.git
faststainnorm --ref full-path-to-reference-image --img path-to-images-to-convert --out path-to-store-output
Surayuth commented 3 years ago

@andreped Thanks for a quick reply and your improvement. I want to use this method in my project too but I also need the whole process to be differentiable which I don't know if it will work in the current state (I will look into the code later if I have time). I'm glad that you are working on this topic too as I really need this normalization method in my work.

andreped commented 3 years ago

Why do you need it to be differentiable? I am assuming you are training neural networks. Do you wish to use this as part of the layer when training?

Surayuth commented 3 years ago

Yes. I would like to use this to extract a saliency map instead of using existing CNN (i.e., UNet, ...) because I don't have many images for training my model.

andreped commented 3 years ago

I am not sure I follow. How to you wish to use a stain normalization method to produce a saliency map? And are you training a patch-wise model using an ordinary neural network with pixels as input or are you training a patch-wise CNN classifier?

Because for both I believe there are better ways to produce "saliency maps", which I believe you want to produce some type of "unsupervised" segmentation? Or perhaps to explain which regions the network used to perform a task?

Or by "saliency map", do you mean that you just wish to segment the image unsupervised?

Surayuth commented 3 years ago

Thanks for pointing it out. I just want to extract the stained area unsupervised but I don't have enough images to train the model. The stain variation is too much that the model trained to extract the stained area is not robust. So, I try to find a simple solution to do the job here. Any suggestion is appreciated here. Thanks!

andreped commented 3 years ago

Do you mean that you just wish to segment the image by saying which pixels in the image contains either haematoxylin (H), eosin (E), or background/glass?

Or do you wish to segment any tissue? Such that the final result is a binary segmentation, 1 for tissue and 0 for glass/background?

Surayuth commented 3 years ago

I wish to segment the concentration of the H region. The fact that I want this process to be differentiable is because I want to use this H region in the loss function of my networks.

andreped commented 3 years ago

OK, but this method should work fine. Just remember to pick a representative and suitable reference image. That seems to be key for these methods to work well. Depending on your images, you might need to change the luminence threshold as well, or do some preprocessing on your images.

But still, you have some images, which you wish to use to perform some task. What stops you from not just generating the H image in preprocessing and passing that with the original image (if of interest) to the network during training, and then pass the H image to the loss?

There is no need for the Macenko method itself to be differentiable. That method is completely static. Even if you where to add this as a layer at the start of you neural network, this operation does not need to be differentiable. The weights will be updated in the NN and will not be backpropagation further (through the Macenko-stuff), but there is really no need for that either, since there are no weights to update there.

Surayuth commented 3 years ago

Thanks for the idea. I understand that the method is static. The thing that I worry about at first is because I want to use this method in the loss function. And if I remember correctly, calling backward() will distribute the gradient along with the computational graph. Some computations, in the past, don't allow the gradient to flow through them, i.e., https://github.com/pytorch/pytorch/issues/27036.

andreped commented 3 years ago

What are you trying to do? Are you using the HE image as input to some neural network, which does something and then you are trying to use the H channel in the loss?

You would still backpropagate from the output through the neural network, until you hit the Macenko-layer. So the network would still update.

And yes, there are likely some operations in the Macenko implementation which is non-differentiable. Were you trying to use Macenko inside the loss function, perhaps? Then I can understand your concern, because that will not work!

The trick is to do this outside the loss computation, just provide the already computed H image the the loss computation. It really depends how you have implemented the whole ting.

Surayuth commented 3 years ago

What are you trying to do? Are you using the HE image as input to some neural network, which does something and then you are trying to use the H channel in the loss?

Yes, the network will output the H channel (result 1) and I want to compare it with the H channel segmented from Macenko method (result 2).

And yes, there are likely some operations in the Macenko implementation which is non-differentiable. Were you trying to use Macenko inside the loss function, perhaps? Then I can understand your concern, because that will not work!

I see. I will look into the code tmr to see what I can do with it then. Thanks for this helpful discussion today. It helps alot.

andreped commented 3 years ago

I have a lot of experience with this from TensorFlow, which really is annoying at times... This is perfectly possible to do without having the Macenko algorithm differentiable. Because extraction of the H image can actually happen outside the graph (at least outside the graph of interest).

No worries! Happy to help :) Let me know how it went! If you'd like, I could help you set this up, if you are willing to share the (private) repository with me.

Surayuth commented 3 years ago

Thanks! I would love to work with you. And I agree with you that it's possible. Just need some modifications to the code. I will share the repository with you soon. Thanks again!

carloalbertobarbano commented 3 years ago

Thanks for the very interesting discussion. I agree that having GPU support would be very nice, and that it should be done with a batch-based approach. @andreped feel free to draft a PR

carloalbertobarbano commented 3 years ago

Btw, very nice work with https://github.com/andreped/fast-stain-normalization @andreped

andreped commented 3 years ago

@carloalbertobarbano I can make an attempt on the batch-based approach next week.

carloalbertobarbano commented 3 years ago

Sounds good

andreped commented 2 years ago

@carloalbertobarbano shall we make an attempt at a new release where we wrap up the TF support thingy and make it stable and working? We could probably setup some workflows for doing all that using github actions. I could make an attempt, if you'd like?

carloalbertobarbano commented 2 years ago

Yes absolutely, feel free to sketch it up. I have been very busy lately.

andreped commented 2 years ago

I remember that there were issues with installing both TensorFlow and PyTorch, and hence, a good idea could be to install torchstain with a specific backend (custom option in pip). Not sure how you would do that, but I could make an attempt.

I will let you know when I have a PR ready for you to test.

carloalbertobarbano commented 2 years ago

Yes that should be avoided. As you already said (I think) this should be handled using the setup options, for example pip install torchstain[tf] or something like that. If no option is provided I'd fall back to pytorch