Closed ThiruRJST closed 2 years ago
File "/opt/conda/envs/test/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/envs/test/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/test/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jupyter/TRACER/model/TRACER.py", line 38, in forward
features, edge = self.model.get_blocks(x, H, W)
File "/home/jupyter/TRACER/model/EfficientNet.py", line 250, in get_blocks
edge = F.interpolate(edge, size=(H, W), mode='bilinear')
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/nn/functional.py", line 3709, in interpolate
return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
(function _print_stack)
16%|███████████████▎ | 76/475 [04:51<25:29, 3.83s/it]
Traceback (most recent call last):
File "main.py", line 49, in <module>
main(cfg)
File "main.py", line 34, in main
Trainer(cfg, save_path)
File "/home/jupyter/TRACER/trainer.py", line 59, in _init_
train_loss, train_mae = self.training(args)
File "/home/jupyter/TRACER/trainer.py", line 117, in training
loss.backward()
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/autograd/_init_.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: Function 'UpsampleBilinear2DBackward1' returned nan values in its 0th output.
The entire stacktrace of the error
Hi, It seems the MEAM did not clearly generate the edges. I recommend you to remove all lines related with the edge generation parts (e.g., generating edges or computing loss).
But how did that run completely fine when using BCE loss alone
I don't exactly know about the dataset you used so I'm not sure what the problem is.
But the error you posted shows that the MEAM module could not capture the edges.
What does it say when you execute the torch.autograd.set_detect_anamoly(True)
?
And also, excluding the lines related with the edge parts works well under the using API loss?
Actually using torch.autograd.set_detect_anamoly
returns False for all tensors
Hi, It seems the MEAM did not clearly generate the edges. I recommend you to remove all lines related with the edge generation parts (e.g., generating edges or computing loss).
How about this approach? Does it work?
@Karel911 can you help me with removing the edge generation parts? because i am facing a similar issue.
@Karel911 my team mate @hackkhai is working on that.
@Karel911 can you help me with removing the edge generation parts? because i am facing a similar issue.
I also curious about which parts make this issue. I released the version of TRACER without edge generation. Replace the released scripts with the existing ones. I briefly tested it so if there is any problem, please let me know.
Thanks.
Thanks, Let me check this out
Was training on custom human dataset. Batch Size = 8 No of training images = 3800
No of steps trained before showing error = 75
After 75th step It generated an error:
The model trained successfully when using BCE loss.
We even checked for NaN values using
torch.autograd.set_detect_anamoly(True)
But it returned False stating that no NaN values were found