TissueImageAnalytics / tiatoolbox

Computational Pathology Toolbox developed by TIA Centre, University of Warwick.
https://warwick.ac.uk/tia
Other
340 stars 71 forks source link

Vahadane stain-norm fails #382

Open mostafajahanifar opened 2 years ago

mostafajahanifar commented 2 years ago

Description

We have been investigating some of the Vahadane stain normalization algorithm and we realize that there is always some outputs that seem not alright (notice the Cyan-ish background the wrong color in the transformed image in the centre): source

So, we investigated various things to find the root of problem including testing with original staintool, checking background estimation, rounding errors, etc. Our final guess was that it something wrong with the dictionary learning. So, I tried with an older version of TIAToolbox (v0.8.0) and scikit-learn (v0.23.0) and that seems to fix the problem.

pip install scikit-learn==0.23.2 Click==7.0 tiatoolbox==0.8.0

then if I apply the same algorithm using Vahadane and same target image, I get the following result, which looks good: source2

So, most probably they have changed something with the newer version of scikit-learn that does not fit with with the current implementation and this requires more investigation.

John-P commented 2 years ago

@mostafajahanifar, it's worth checking the default parameters (kwargs) for dictionary learning between scikit-learn versions. This has caused things to break in the past. In future, to avoid this we should explicitly set all parameters.

mostafajahanifar commented 2 years ago

Thanks @John-P for the suggestion. Yes, this was the first thing that occurred to me as well. I checked for it and unfortunately there is not changes in default parameters. Even if there is, we should have been fine because we are explicitly setting parameters in this case.

Having said that, I thought a good test would be to change the parameters but as Dang says: "That wouldn't be Vahadane method annymore"! Still, it might be a good test to check if problem still persists.

John-P commented 1 year ago

@mostafajahanifar Looks like there was a change to dictionary learning in 0.23.0: https://scikit-learn.org/stable/whats_new/v0.23.html#sklearn-decomposition

John-P commented 1 year ago

Looks like it is listed several times in the change log:

John-P commented 1 year ago

Looking at the diff between 0.21 and 0.23 it looks like there were quite a few changes made to dictionary learning. In some places, most of it has been re-written. I have made copies of each version and black formatted the old one and removed some of the long docstring which were adding a lot of noise to the diff if you want to compare in a diff tool.

John-P commented 1 year ago

Here is a zip of the file which changed dict_learning_changes.zip

afilt commented 1 year ago

Hi @John-P ! Do you plan fixing this issue any time soon ? We are faced to the same problem and we would like to work on a more recent version of scikit-learn (> 0.23.2). Thanks :)

shaneahmed commented 1 year ago

This issue seems to be created due to this PR https://github.com/scikit-learn/scikit-learn/pull/17433. We need to create an issue on https://github.com/scikit-learn/scikit-learn/

John-P commented 1 year ago

Hi @afilt, I don't have much time to work on the toolbox right now, but we would like to get this issue resolved. We are considering how we can fix this to work with the new sciki-learn or potentially moving to spams instead.

John-P commented 1 year ago

This is proving very difficult to resolve. Some options that we have discussed are:

  1. Use SPAMS instead of scikit-learn. However, there is no direct Windows support. There is a third party pip package for windows though.
  2. Try to fix the scikit-learn dictionary learning code (again).
    • Maybe check for overflows / data type issues?
  3. Use the BSD-3 code from the old version of sciki-learn in the toolbox.
mostafajahanifar commented 2 months ago

[UPDATE] SPAMS is under GPLv3 license, which makes it an inappropriate solution!

So, I have revisited this after a while. In my opinion, solution 3 is not feasible because the code for dictionary learning in scikit-learning is not in a single module to be taken, there are other local imports related to it which makes it impossible to port it over. Not to mention we have to write tests for those modules 👎 Also, We have tried doing solution 2 a few times with no resolution. so, no point in going after that with these small resources that we have.

So, I decided to fiddle around SPAMS and see if it works on Windows. For this, I used unofficial SPAMS binaries: pip install spams-bin which was installed smoothly on a working conda environment having all the recent TIAToolbox requirements installed in it. Then, I checked to see if the Vahadane algorithm from "Staintools" works with my SPAMS installation, and to my surprise, it WORKED! So, I took the same source and target images and tried Vahadane stain normalizer from Staintools 100 times with none of the tries failing. I enlarged the images to high resolution and there was success too.

import staintools
import matplotlib.pyplot as plt
import cv2

# Read data
target = staintools.read_image("D:/target.png")
to_transform = staintools.read_image("D:/source.png")

# Standardize brightness (optional, can improve the tissue mask calculation)
target = staintools.LuminosityStandardizer.standardize(target)
to_transform = staintools.LuminosityStandardizer.standardize(to_transform)

for i in range(100):
    # Stain normalize
    normalizer = staintools.StainNormalizer(method='vahadane')
    normalizer.fit(target)
    transformed = normalizer.transform(to_transform)
    cv2.imwrite(f"D:/temp/{i}.png", transformed[...,::-1])

So, maybe we can use spam binaries as they are in the toolbox and rewrite the toolbox to use spam for the Vahadane algorithm (just like what Staintool). However, before doing that we need more testing: