MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link

ValueError: matrix contains invalid numeric entries #55

Closed runhuzhao closed 4 months ago

runhuzhao commented 4 months ago

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(args) File "/home/data/project/SAN/train_net.py", line 274, in main return trainer.train() File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 395, in run_step loss_dict = self.model(data) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/data/project/SAN/san/model/san.py", line 206, in forward losses = self.criterion(outputs, targets) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/data/project/SAN/san/model/criterion.py", line 234, in forward indices = self.matcher(outputs_without_aux, targets) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/data/project/SAN/san/model/matcher.py", line 184, in forward return self.memory_efficient_forward(outputs, targets) File "/home/data/anaconda3/envs/san/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/home/data/project/SAN/san/model/matcher.py", line 156, in memory_efficient_forward indices.append(linear_sum_assignment(C)) ValueError: matrix contains invalid numeric entries

请问您有遇到过这个问题吗?

runhuzhao commented 4 months ago

不好意思,我解决了,是我们硬件的问题,反向传播的时候出现Nan了。打扰了