Nan values in output - Githubissues

YanisKouadri commented 6 months ago

Hello,

First of all, thank you for your work. I am writing you because I'm having a hard time isolating a problem : at some point during my training, the output of the on-the-fly normalization contains Nan Values. I was wondering if that can happen when the normalization reference (used to fit) is contained in the batch of images to normalize ? It doesn't always happen, which is very strange to me. It may be totally unrelated to my problem, but I need to get an answer to this lead

Have a good day

CielAl commented 6 months ago

Hi,

thank you for reaching out. May I ask which normalization algorithm did you use and are you able to narrow down what these images look like by fixing the seeds?

From my experience sometimes it occurs when the input images are ill-conditioned (e.g., too many slide backgrounds and too few tissue areas with stain information), and that's a reason why a luminosity-based thresholding was provided in both this package and the original spams-based stain-tools (which will explicitly throw an error if the resulting tissue mask contains too few tissue area).

I suggest you do the below to narrow it down: (1) Try to catch the exception and save or display the batches that lead to NaN. What do they look like? You don't need a full procedure of training, only the loop of data traverse and on-the-fly normalization and then check the presence of NaN. (2) Manually apply normalization to these batches -- does it always give you NaN or occasionally? Does it always lead to NaN regardless of what random seeds you fix? (3) What do the luminosity tissue masks look like (from torch_staintools.functional.tissue_mask import get_tissue_mask). (4) What if you use other luminosity thresholds (default is 0.8)?

If it is indeed related to the percentage of tissue area and/or luminosity tissue masks, it is recommended to preprocess the images such that those containing too few tissues are either discarded or skipped.

Best regards,

YanisKouadri commented 6 months ago

Thank you for your detailed answer !

I can now reproduce the NaN as I want. From my new tests, the problem seems to come from patches with low tissue ratio, as you said, but only on the vahadane normalizer and like 1 out of 3 times. I will simply filter more patches in my dataset. (Do you think these problematic patches can cause problems to the other normalizers even if nothing seems out of the ordinary ?)

CielAl commented 6 months ago

Thank you for your detailed answer !

I can now reproduce the NaN as I want. From my new tests, the problem seems to come from patches with low tissue ratio, as you said, but only on the vahadane normalizer and like 1 out of 3 times. I will simply filter more patches in my dataset. (Do you think these problematic patches can cause problems to the other normalizers even if nothing seems out of the ordinary ?)

From my experience it may be problematic to Macenko as well. For instance if the pixels are all background slides with nearly same intensities, the covariance matrix of OD could be close to singular and therefore give you very small eigenvectors and result in artifacts. Therefore even without numeric errors, it is still considered as good practice to do data sanitization beforehand (e.g., you might not want patches of glass background to be your downstream model inputs in the first place).

Best,

YanisKouadri commented 6 months ago

Thanks again for your answer. Sorry to re-open this, I have one last question, with no link to the previous issue :

I see that fitting expects a batch. I thought I was supposed to fit based on a single Image. Am I supposed to use a large set of targets or is a single image enough ?

CielAl commented 6 months ago

Thanks again for your answer. Sorry to re-open this, I have one last question, with no link to the previous issue :

I see that fitting expects a batch. I thought I was supposed to fit based on a single Image. Am I supposed to use a large set of targets or is a single image enough ?

You may actually feel free to open a new issue in this case lol. But stick to your question:

When you fit a normalizer, it really just fetch the stain matrix/concentration (for Macenko/Vahadane) or intensity distribution (mean/std for Reinhard) in the target domain, and use them to normalize your source domain images to the target domain.

Usually you can choose a region of interest (size not necessarily same as your normalization input and it can be a lot larger) that is representable enough to the target dataset as your template/reference image to "fit", and that might just suffice. But if you concern whether a single template image introduces bias, without changing the implementation of the normalizers, you can sample multiple templates and fit multiple normalizers, and in your normalization procedure you simply randomly choose one normalizer a time to normalize your random input patches.

Nonetheless, downstream empirical evaluation is needed to justify what is the best approach for your project. Hope that helps.

CielAl / torch-staintools

Nan values in output #23