Closed jxbb233 closed 8 months ago
您好,我之前确实没遇到过这样的问题,我猜想会不会是因为keyframe缓存太多,导致的资源占用过大,您可以试着调整这个值把这里的20调成2试试,如果可以的话再增大。
更改这个数值后确实可以运行更久了,但是后面会报以下这个错。
********** current num kfs: 2 **********
frame id 956
trans tensor([150.7139, 2.9822, 0.5144], device='cuda:0', grad_fn=<SubBackward0>)
frame id 957
trans tensor([152.0837, 3.0972, 0.5227], device='cuda:0', grad_fn=<SubBackward0>)
/usr/local/lib/python3.8/dist-packages/torch/functional.py:599: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2315.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/NeRF-LOAM-master/src/mapping.py", line 98, in spin
tracked_frame = kf_buffer.get()
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
我尝试了两次,这个报错会稳定触发。 另外,我查看log后发现运行时完全没有进入post-processing steps,这是正常现象吗? 感谢您的帮助!
After he change the GPU to rtx8000, the problem seems to be solved, the problem may be the unsuficient memory in docker environment. So I close this problem.
你好, 我尝试在kitti数据集的一条完整的序列上(超过1000frames)运行demo,发现经常在运行一段时间后,整个进程无缘无故的卡住,当所需处理的frames很少时则不会出现这样的情况。 这样的情况主要发生在post-processing steps时,卡住后并没有相关报错信息,用
ps -ef
查看进程本身也没有结束。以下是运行时log的最后几行。以下是我运行的代码。
为了可视化,我对
kitti.yaml
做了以下更改。感谢您的帮助!