ewrfcas / MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)
Apache License 2.0
181 stars 10 forks source link

作者你好,单卡4090无法训练问题 #36

Open chenhui2016 opened 6 months ago

chenhui2016 commented 6 months ago

我将config文件参数设置为如图所示才能运行 image

然而train到下面这个位置报错 image

报错信息如下:Exception in thread Thread-1: Traceback (most recent call last): File "/home/ch/anaconda3/envs/pt/lib/python3.10/threading.py", line 1009, in _bootstrap_inner self.run() File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 202, in run data = self._queue.get(True, queue_wait_duration) File "/home/ch/anaconda3/envs/pt/lib/python3.10/multiprocessing/queues.py", line 117, in get res = self._recv_bytes() File "/home/ch/anaconda3/envs/pt/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/ch/anaconda3/envs/pt/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes buf = self._recv(4) File "/home/ch/anaconda3/envs/pt/lib/python3.10/multiprocessing/connection.py", line 388, in _recv raise EOFError EOFError Traceback (most recent call last): File "/home/ch/sn_d/code/MVS/MVSFormer-main/train.py", line 191, in mp.spawn(main, nprocs=args.world_size, args=(args, config)) File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method="spawn") File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes while not context.join(): File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/home/ch/sn_d/code/MVS/MVSFormer-main/train.py", line 146, in main trainer.train() File "/home/ch/sn_d/code/MVS/MVSFormer-main/base/base_trainer.py", line 78, in train result = self._train_epoch(epoch) File "/home/ch/sn_d/code/MVS/MVSFormer-main/trainer/mvsformer_trainer.py", line 164, in _train_epoch self.scaler.step(self.optimizer) File "/home/ch/anaconda3/envs/pt/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 412, in step assert ( AssertionError: No inf checks were recorded for this optimizer. 请问作者用24G显存的显卡跑过有这样的问题吗?该如何解决 最后 十分感谢您的工作!