I am trying to train the network on one gpu on YCB dataset with apex.amp. I selected default parameters (minibatch=3) and tried both training from scratch or fine-tuning on pretrained model, it always give 'tuple index out of range' error in line 301 of ffb6d.py
which is the final upsample layer, consists of a PSPUpsample module followed by a Sequential(Conv2d, LogSoftmax). I attached the raw output below. Have you tried before training the network with amp or have thoughts on possible reason of this error?
File "train_ycb.py", line 666, in <module>
train()
File "train_ycb.py", line 653, in train
trainer.train(
File "train_ycb.py", line 463, in train
_, loss, res = self.model_fn(self.model, batch, it=it)
File "train_ycb.py", line 234, in model_fn
end_points = model(cu_dt)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/huijie/research/progresslabeller/FFB6D/ffb6d/models/ffb6d.py", line 301, in forward
rgb_emb = self.cnn_up_stages[n_up_layers-1](rgb_emb)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
File "/home/huijie/.local/lib/python3.8/site-packages/apex/amp/wrap.py", line 21, in wrapper
args[i] = utils.cached_cast(cast_fn, args[i], handle.cache)
File "/home/huijie/.local/lib/python3.8/site-packages/apex/amp/utils.py", line 97, in cached_cast
if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range
Hi,
I am trying to train the network on one gpu on YCB dataset with apex.amp. I selected default parameters (minibatch=3) and tried both training from scratch or fine-tuning on pretrained model, it always give 'tuple index out of range' error in line 301 of ffb6d.py
rgb_emb = self.cnn_up_stages[n_up_layers-1](rgb_emb)
which is the final upsample layer, consists of a PSPUpsample module followed by a Sequential(Conv2d, LogSoftmax). I attached the raw output below. Have you tried before training the network with amp or have thoughts on possible reason of this error?