EIDOSLAB / torchstain

Stain normalization tools for histological analysis and computational pathology
MIT License
111 stars 20 forks source link

Fails with `kthvalue(): Expected reduction dim 0 to have non-zero size.` #1

Closed vahvero closed 3 years ago

vahvero commented 3 years ago

Emptyish image from slide crop fails to normalize and crashes.

Expected behavior:

Program does not crash on valid RGB images.

Steps to reproduce

# %% Import

import torch
from torchvision import transforms
import torchstain
import cv2
# %% Try transformation

target = cv2.cvtColor(cv2.imread("temp.png"), cv2.COLOR_BGR2RGB)

T = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x*255)
])

torch_normalizer = torchstain.MacenkoNormalizer(backend='torch')
torch_normalizer.fit(T(target))

Causes: kthvalue(): Expected reduction dim 0 to have non-zero size.

Behavior is replicated in numpy implementation.

"temp.png" used. temp

Enviroment Ubuntu 18.04

>>> python --version
Python 3.9.6
>>>pip freeze
numpy==1.21.2
opencv-python==4.5.3.56
Pillow==8.3.1
torch==1.9.0
torchstain==1.1.0
torchvision==0.10.0
typing-extensions==3.10.0.0
carloalbertobarbano commented 3 years ago

Hi, I suspect this is due to the empty image. IMO, you should filter empty images when loading your data, I am not sure that an empty image should be considered as "valid" input for stain normalization

nabilapuspit commented 1 year ago

can you tell me how to overcome this kind of issue? I've got the same error even when I checked the input is not an empty images.

andreped commented 1 year ago

@nabilapuspit I would think empty patches are not of interest to the training and inference pipeline, and I would assume this is an IndexError?

If thats the case, I would try to catch and skipping image patches this occurs on, for instance like so::

for patch in patches:
    try:
        normalized = torch_normalizer.normalize(T(patch), stains=False)
    except IndexError:
        continue

    [...do some processing on successfully normalized patch here...]

Alternatively, if you are streaming patches from a WSI during training, and where some mostly empty patches could occur, you could just pass the original image if the exception occured.

carloalbertobarbano commented 1 year ago

Hi @nabilapuspit, can you provide an example of a crashing image?

nabilapuspit commented 1 year ago

thank you for your fast response. @andreped I do thinking that the problem is an IndexError. I have tried to do your suggestion actually but the program is not running as it should. It's running tough, but I didn't get any output and it takes time. I realized that this problem happened when the program running to the next batch.

processing RESULTS_DIRECTORY/WSI/patches/TCGA-B5-A11H-01.h5: total of 93 batches batch 0/93, 0 files processed

Traceback (most recent call last): File "extract_features_fp.py", line 125, in custom_downsample=args.custom_downsample, target_patch_size=args.target_patch_size) File "extract_features_fp.py", line 50, in compute_w_loader for count, (batch, coords) in enumerate(loader): File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in next data = self._next_data() File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data return self._process_data(data) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/_utils.py", line 543, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/chingwei/bf764154-3611-460c-9bfd-4efed447a219/chingwei/Nabila/CLAM/CLAM-mod/datasets/dataset_h5.py", line 210, in getitem img3,h,e = torch_normalizer.normalize(I=t_to_transform, stains=True) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torchstain/torch/normalizers/macenko.py", line 104, in normalize HE, C, maxC = self.compute_matrices(I, Io, alpha, beta) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torchstain/torch/normalizers/macenko.py", line 67, in compute_matrices HE = self.find_HE(ODhat, eigvecs, alpha) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torchstain/torch/normalizers/macenko.py", line 38, in find_HE minPhi = percentile(phi, alpha) File "/home/chingwei/anaconda3/envs/clam_n/lib/python3.7/site-packages/torchstain/torch/utils/percentile.py", line 24, in percentile return t.view(-1).kthvalue(k).values IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

while if the program running succesfully, it will show something like:

processing RESULTS_DIRECTORY/WSI/patches/TCGA-B5-A11H-01.h5: total of 93 batches batch 0/93, 0 files processed batch 20/93, 10240 files processed batch 40/93, 20480 files processed batch 60/93, 30720 files processed batch 80/93, 40960 files processed

computing features for FEATURES_DIRECTORY/WSI/h5_files/TCGA-B5-A11H-01.h5 took 145.10374426841736 s features size: (47382, 1024) coordinates size: (47382, 2)

if its a small images or only 20 batches needed batch 0/93, 0 files processed its completely fine)

@carloalbertobarbano, I think its not the image problem, because the image is not empty as I attached bellow. 1-5140

andreped commented 1 year ago

I was unable to reproduce this issue. See gist.

I ran two tests where I used 1) the original target image used in the example in the README and the image you provided as source, and 2) I tried to use the same image you provided as target and source. Both worked fine.

Due to this, I believe the source of your issue is likely due to you providing unappropriate input to the algorithm. Note that the input has to be in range [0, 255], as per this transform:

T = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x*255)
])

Please, verify that the data provided is indeed a tensor and that the intensities are scaled appropriately. Note that you have to do this both for fit() and normalize().


Also, one thing that I found confusing from your original issue above, is that it looked like you were trying to run fit() on a blank image, which makes no sense. The fit() method is only to be ran once, on a suitable target image you choose. You update the concentration matrix according to a single target image (reference), and the all future patches' colours are normalized against that same patch's colours. Hence, for all future patches you use normalize() only. Perhaps this was unclear and could be the cause of this issue as well?

nabilapuspit commented 1 year ago

I know the picture I attached will work fine because as I said, the program running very well on the first 20 batches (0-19) but then the issue started on the second 20 batches until the end. I think i did the code like the example and the gist you provide, I run the fit () only on template and normalize () the patches but still got the issue and still trying to figure out why this issue happened

andreped commented 1 year ago

But that does not make much sense to me. This implementation does not support batch mode. The method is to be applied to individual patches, which I believe you did. Using this on than 20 or 2000 batches should not matter.

I believe that after running more than 20 batches (which means that more patches are included), you have a higher chance of running into a patch that fails. Can you provide a patch where you observe this? As I believe this works fine on the patch you provided above, such that I could see if I could reproduce this?

Also, by second 20 batches, are you talking about the second epoch? Or are you referring to a batch size of 20? Not sure if I understand.

nabilapuspit commented 1 year ago

I still thinking that there's nothing wrong with the images. I'm sure the problem that i haven't put the algorithm right yet. I'm using TCGA endometrium whole slide data in CLAM framework. I tried to apply the algorithm after this line by creating StainNormalization class and utilizing the target image from this repository. As I said, the algorithm works fine in a small images, (not more than 20 batches). the issue is happened when I'm using a big images.

Also, by second 20 batches, are you talking about the second epoch? Or are you referring to a batch size of 20? Not sure if I understand.

based on CLAM framework in extract_features_fp.py, I believe its refer to batch size for computing features in batches.

carloalbertobarbano commented 1 year ago

Could you provide a sample code of how you are applying torchstain?

nabilapuspit commented 1 year ago

sure

tem_path = '/media/chingwei/bf764154-3611-460c-9bfd-4efed447a219/chingwei/Nabila/CLAM/CLAM-mod/normalization_template.jpg'
tem = cv2.imread(tem_path)
template = cv2.cvtColor(tem, cv2.COLOR_BGR2RGB)

class StainNormalizer:
    def __init__(self):
        self.T = transforms.Compose([
            transforms.ToTensor(),
            transforms.Lambda(lambda x:x*255)
        ])
        self.torch_normalizer = torchstain.normalizers.MacenkoNormalizer(backend='torch')

    def fit_template(self,target):
        target_tensor = self.T(target)
        self.torch_normalizer.fit(target_tensor)

    def normalize_image(self,img):
        img_tensor = self.T(img)
        img_normalized,H,E = self.torch_normalizer.normalize(I=img_tensor, stains = True)
        return img_normalized,H,E

then, bellow this line I applying the torchstain as follows


img1 = np.array(img)

self.StainNormal.fit_template(template)
img3,h,e = self.StainNormal.normalize_image(img1)

Oh btw I called the class using self.StainNormal = StainNormalizer()

andreped commented 1 year ago

As far as I can see, you have not performed the same compose for the target as for the source images:

normalizer.fit(T(target))

Might be that's why? Not sure how this could be related to batch size though, but it would make sense that normalization could fail, as the concentration matrix would be corrupt and not suitable for normalization.

Also, I would think that you do not need to extract the H and E components. Thus, making the normalization step faster, like so:

def normalize_image(self, img):
    img_tensor = self.T(img)
    return self.torch_normalizer.normalize(I=img_tensor, stains = False)[0]  # in this case H and E are just None, disregard them
carloalbertobarbano commented 1 year ago

Thanks @nabilapuspit ..can I also see how you use the StainNormal in your training loop and how you load the actual training images?

nabilapuspit commented 1 year ago

@andreped Sorry as you keep telling me about this,

As far as I can see, you have not performed the same compose for the target as for the source image

is it different with the code in my class:

def fit_template(self,target):
        target_tensor = self.T(target)
        self.torch_normalizer.fit(target_tensor)

because I already used the whole code just like this repository provide, but since it didn't works that's why I tried to changed it as a class. I did it like :

T = transforms.Compose([
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x*255)
            ])
        torch_normalizer = torchstain.normalizers.MacenkoNormalizer(backend='torch')
        torch_normalizer.fit(T(template))
        t_to_transform = T(img1)
        img3,h,e = torch_normalizer.normalize(I=t_to_transform, stains=True)

Also, I would think that you do not need to extract the H and E components.

by the way, thank you thank you for this info, it'll help reduce the running time

@carloalbertobarbano, sure. as I said, I applied it in CLAM framework in this line like this:

def __getitem__(self, idx):
        with h5py.File(self.file_path,'r') as hdf5_file:
            coord = hdf5_file['coords'][idx]
        img = self.wsi.read_region(coord, self.patch_level, (self.patch_size, self.patch_size)).convert('RGB')
        img1 = np.array(img) 
        self.StainNormal.fit_template(template)
        img3,h,e = self.StainNormal.normalize_image(img1)

        if self.target_patch_size is not None:
            img = img.resize(self.target_patch_size)

        img = self.roi_transforms(img).unsqueeze(0)

        return img, coord
carloalbertobarbano commented 1 year ago

I am not sure on why this is happening. Are you sure that you are applying the same transformation to template? Where do you load it? Also, there is no need to call fit(template) every time, you can do it just once in the __init__ function and save time

nabilapuspit commented 1 year ago

Where do you load it?

I load it in the begining of dataset_h5.py file.

Also, there is no need to call fit(template) every time, you can do it just once in the init function and save time

Oh, thank you its definitely save time

I wonder, whether the return img in my code above might be the cause of this issue

carloalbertobarbano commented 1 year ago

Okay. The code seems fine

I know the picture I attached will work fine because as I said, the program running very well on the first 20 batches (0-19) but then the issue started on the second 20 batches until the end.

Could you provide an image that does not work?

andreped commented 1 year ago

Could you provide an image that does not work?

This would be very helpful for debugging this, @nabilapuspit!

Also, are you sure it did not fail during fit() and not normalize()? It would make sense that it might struggling running fit on some images. Still, why this would be related to batch size is beyond me.

Or wait... Maybe it is related to you attempting to run normalization multiple times in parallel. If you do the same with fit(), it will likely not work, as you are trying to update the same concentration matrix, and then you may run into deadlocks and threading issues attempting to access the same resource.

Hence, if that was the issue, I believe it might have been resolved by only doing fit once. Still note that accessing the same normalization object between processes is still a challenge. Not sure if you handled that in a robust manner to work correctly between processes (note that PyTorch does multiprocessing and not multithreading, hence, you cannot necessarily access the same resource between processes, as you can for threads).

Lastly, if you are using multiple workers in your DataLoader (or similar), which is to perform multiprocessing, could you try to disable it to see if it resolves the issue?

nabilapuspit commented 1 year ago

Or wait... Maybe it is related to you attempting to run normalization multiple times in parallel.

It seems make sense for me, I still struggling with this issue.

Lastly, if you are using multiple workers in your DataLoader (or similar), which is to perform multiprocessing, could you try to disable it to see if it resolves the issue?

I've tried this but it still facing the same problem. I've tried to use stain normalization from TIAToolbox and it works fine, accept the time cost and I prefer the result of torchstain. still not sure what to do

carloalbertobarbano commented 1 year ago

Maybe we should check on which images it actually fails @nabilapuspit, to ensure that all the images passed to torchstain are actually not empty. Can you provide a sample image on which it fails?

undercutspiky commented 10 months ago

@carloalbertobarbano @andreped I use this as the target image: https://github.com/FarhadZanjani/Histopathology-Stain-Color-Normalization/blob/master/Images/Template.png And the attached images as the source images. This results in the error. And I think for 2 of my images, torch.linalg.lstsq will throw an error. I understand that these images fail because there's no H component in them but the library should be able to handle such cases and maybe set the H component to 0 and just normalize the E component (if all of that is possible). Currently, I'm just removing all such images from my dataset but that's not ideal as it's still a tissue image. patch_patient_099_node_4_x_32850_y_60390 patch_patient_099_node_4_x_42660_y_59670 patch_patient_099_node_4_x_48330_y_53820 patch_patient_099_node_4_x_48420_y_43830 patch_patient_099_node_4_x_48510_y_54180 patch_patient_099_node_4_x_48690_y_54360 patch_patient_099_node_4_x_50310_y_56160 patch_patient_099_node_4_x_50580_y_58500 patch_patient_099_node_4_x_51120_y_57420 patch_patient_099_node_4_x_52470_y_59400

andreped commented 10 months ago

@undercutspiky I think what you are observing is an underlying problem with Macenko. I think it is better if we let the user catch the error when it occurs and then they can choose what they do.

But maybe we could add a way to handle that in torchstain as well. What do you think, @carloalbertobarbano?

Personally, I have started using Reinhard instead due to these really unfortunate numerical stability issues...

carloalbertobarbano commented 10 months ago

I think that hiding issues with images not suited for macenko (e.g. not throwing errors) could lead to downstream issues that can be very difficult to catch. Though, perhaps, we could print a warning..