RuntimeError: torch.linalg.cholesky: For batch 8: U(1,1) is zero, singular U.

gaozhitong commented 3 years ago

Hi, thanks for your great contributions! I am trying to reimplement SSN on the LIDC dataset in PyTorch by consulting this repo and your TensorFlow implementation. However, I meet some problems with td.LowRankMultivariateNormal(loc=mean, cov_factor=cov_factor, cov_diag=cov_diag) during training. To be specific, I get the following error after several iterations. I don't know what the reason is and have tried to put a smaller learning rate (1e-4), but the error still happens.

File "/root/Anacondas/anaconda3/lib/python3.6/site-packages/torch/distributions/lowrank_multivariate_normal.py", line 108, in __init__ self._capacitance_tril = _batch_capacitance_tril(cov_factor, cov_diag) File "/root/Anacondas/anaconda3/lib/python3.6/site-packages/torch/distributions/lowrank_multivariate_normal.py", line 19, in _batch_capacitance_tril return torch.linalg.cholesky(K) RuntimeError: torch.linalg.cholesky: For batch 8: U(1,1) is zero, singular U.

Thanks very much for your time! Looking forward to your reply.

MiguelMonteiro commented 3 years ago

Hi,

This usually happens when the covariance values becomes too large. This can happen when there are large uniform regions in the image which will inevitably be 100% correlated. What we did in our paper was to use ROI masks (which is also implemented in the code) to ignore these regions during training.

gaozhitong commented 3 years ago

Thanks for your reply! I have solved the problem. Actually, it is because I mask out the background in the LIDC dataset, where there exist blank annotations.

MiguelMonteiro commented 3 years ago

Glad it's solved,

Best,

Miguel

biomedia-mira / stochastic_segmentation_networks

RuntimeError: torch.linalg.cholesky: For batch 8: U(1,1) is zero, singular U. #4