ethnhe / FFB6D

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.
MIT License
291 stars 72 forks source link

'tuple index out of range' error when using 'opt_level=O1' #44

Open huijieZH opened 2 years ago

huijieZH commented 2 years ago

Hi,

I am trying to train the network on one gpu on YCB dataset with apex.amp. I selected default parameters (minibatch=3) and tried both training from scratch or fine-tuning on pretrained model, it always give 'tuple index out of range' error in line 301 of ffb6d.py

rgb_emb = self.cnn_up_stages[n_up_layers-1](rgb_emb)

which is the final upsample layer, consists of a PSPUpsample module followed by a Sequential(Conv2d, LogSoftmax). I attached the raw output below. Have you tried before training the network with amp or have thoughts on possible reason of this error?

 File "train_ycb.py", line 666, in <module>
    train()
  File "train_ycb.py", line 653, in train
    trainer.train(
  File "train_ycb.py", line 463, in train
    _, loss, res = self.model_fn(self.model, batch, it=it)
  File "train_ycb.py", line 234, in model_fn
    end_points = model(cu_dt)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huijie/research/progresslabeller/FFB6D/ffb6d/models/ffb6d.py", line 301, in forward
    rgb_emb = self.cnn_up_stages[n_up_layers-1](rgb_emb)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/huijie/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
  File "/home/huijie/.local/lib/python3.8/site-packages/apex/amp/wrap.py", line 21, in wrapper
    args[i] = utils.cached_cast(cast_fn, args[i], handle.cache)
  File "/home/huijie/.local/lib/python3.8/site-packages/apex/amp/utils.py", line 97, in cached_cast
    if cached_x.grad_fn.next_functions[1][0].variable is not x:
IndexError: tuple index out of range
Whishing commented 2 years ago

Maybe you can reference this issue https://github.com/NVIDIA/apex/issues/694