cvg / pixloc

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)
Apache License 2.0
735 stars 92 forks source link

untimeError: torch.linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite) #11

Open patelajaychh opened 2 years ago

patelajaychh commented 2 years ago

Command Run - $ python -m pixloc.run_Aachen Python Version - 3.8 Torch version - 1.10.0

[11/08/2021 20:32:50 pixloc.localization.model3d INFO] Reading COLMAP model /home/ajay/pixloc/outputs/hloc/Aachen/sfm_superpoint+superglue. [11/08/2021 20:33:03 pixloc.utils.io INFO] Imported 824 images from day_time_queries_with_intrinsics.txt [11/08/2021 20:33:03 pixloc.utils.io INFO] Imported 98 images from night_time_queries_with_intrinsics.txt [11/08/2021 20:33:03 pixloc.pixlib.utils.experiments INFO] Loading checkpoint checkpoint_best.tar [11/08/2021 20:33:07 pixloc.localization.localizer INFO] Starting the localization process... 4%|███▉ | 40/922 [01:30<33:06, 2.25s/it] Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ajay/pixloc/pixloc/run_Aachen.py", line 76, in main() File "/home/ajay/pixloc/pixloc/run_Aachen.py", line 68, in main poses, logs = localizer.run_batched(skip=args.skip) File "/home/ajay/pixloc/pixloc/localization/localizer.py", line 82, in run_batched ret = self.run_query(name, camera) File "/home/ajay/pixloc/pixloc/localization/localizer.py", line 131, in run_query ret = self.refiner.refine(name, camera, dbs, loc=loc) File "/home/ajay/pixloc/pixloc/localization/refiners.py", line 109, in refine ret = self.refine_query_pose(qname, qcamera, T_init, p3did_to_dbids, File "/home/ajay/pixloc/pixloc/localization/base_refiner.py", line 179, in refine_query_pose ret = self.refine_pose_using_features(features_query, scales_query, File "/home/ajay/pixloc/pixloc/localization/base_refiner.py", line 117, in refine_pose_using_features T_opt, fail = opt.run(p3d, F_ref, F_q, T_i.to(F_q), File "/home/ajay/pixloc/pixloc/utils/tools.py", line 48, in wrapped rets = func(*args_converted, *kwargs) File "/home/ajay/pixloc/pixloc/pixlib/models/base_optimizer.py", line 101, in run return self._run(args, **kwargs) File "/home/ajay/pixloc/pixloc/pixlib/models/learned_optimizer.py", line 78, in _run delta = optimizerstep(g, H, lambda, mask=~failed) File "/home/ajay/pixloc/pixloc/pixlib/geometry/optimization.py", line 29, in optimizerstep U = torch.linalg.cholesky(H, upper=True) RuntimeError: torch.linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

sjg02122 commented 2 years ago

I have the same Issue.

patelajaychh commented 2 years ago

I have the same Issue.

@sjg02122 I got this resolved by downgrading torch version to torch=1.7

sarlinpe commented 2 years ago

I have indeed only tested PyTorch 1.7 thinking that following versions would be fine. I am right now unable to install >=1.8 due to my CUDA install. Can you check if this is due to NaNs in the input matrix? if not, you could try to increase the eps value in https://github.com/cvg/pixloc/blob/90f7e968398252e8557b284803ee774cb8d80cd0/pixloc/pixlib/geometry/optimization.py#L7

sarlinpe commented 2 years ago

I cannot manage to reproduce this error on GPU with torch=1.10.0 and CUDA=10.2. Are you running this on CPU? Could you check if there is any NaN in the input?

patelajaychh commented 2 years ago

I cannot manage to reproduce this error on GPU with torch=1.10.0 and CUDA=10.2. Are you running this on CPU? Could you check if there is any NaN in the input?

I'm running this on GPU only. Talking about NaN in input, I'm not sure because I ran python -m pixloc.run_Aachen command which I think shouldn't contain any NaN.

Anyway, code worked with torch=1.7 so I didn't investigated any more.

georg-bn commented 2 years ago

I have this issue as well. There is a check implemented that seems to relate to the problem: https://github.com/cvg/pixloc/blob/8f253dcd1f1d7cbe1bc4f1cb4ee628b1bd344fcb/pixloc/pixlib/geometry/optimization.py#L34-L45 However, the error message obtained when cholesky fails in torch=1.10 is not the same as in torch=1.7. Running

python -c "import torch; H = torch.ones([2,2]); U = torch.cholesky(H)"

with torch=1.7.0 gives RuntimeError: cholesky_cpu: U(2,2) is zero, singular U. while torch=1.10.1 gives RuntimeError: cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).

kyrie-23 commented 2 years ago

I have this issue as well. Python Version - 3.8 Torch version - 1.11.0

Because my GPU is A100, torch cannot be downgrade. How do I solve this problem.