RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

aiyodiulehuner commented 3 years ago

Traceback (most recent call last): File "train.py", line 142, in fire.Fire() File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire target=component.name) File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "train.py", line 109, in train _bboxes, _labels, _scores = trainer.faster_rcnn.predict([oriimg], visualize=True) File "/workspace/model/faster_rcnn.py", line 19, in new_f return f(*args,*kwargs) File "/workspace/model/faster_rcnn.py", line 233, in predict roi_cls_loc, roiscores, rois, = self(img, scale=scale) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/workspace/model/faster_rcnn.py", line 133, in forward h, rois, roi_indices) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/workspace/model/faster_rcnn_vgg16.py", line 149, in forward pool = pool.view(pool.size(0), -1) # flat 操作 pool size == [300, channel(500) w(7) * h(7)] RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

If you suspect this is an IPython 7.16.1 bug, please report it at: https://github.com/ipython/ipython/issues or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug" to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via: %config Application.verbose_crash=True 怎么解决这个问题,大佬们

yuechenshun commented 3 years ago

在fire.Fire（）括号里加train试试

wulele2 commented 3 years ago

您好，我也遇到这个问题了。fire.Fire()加train报同样的错。

aiyodiulehuner commented 3 years ago

在fire.Fire（）括号里加train试试

加train 之后 Adam 可以，SGD 也报同样的错

aiyodiulehuner commented 3 years ago

在fire.Fire（）括号里加train试试还有，这个问题为啥可以加train试一下

wulele2 commented 3 years ago

您好，加上adam效果是不是没有sgd好啊。

aiyodiulehuner commented 3 years ago

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

wulele2 commented 3 years ago

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

麻烦弄好能公布一个mAP吗？谢谢了。我debug找出的原因是在loc2box的函数中，在计算dw，dh时候exp溢出，进而导致RPN生成的128候选框的坐标变成nan。然后网络将这些候选框给裁掉之后导致不够128个。最终batch变成了0.报的错。然后，loss变成nan，梯度爆炸。

aiyodiulehuner commented 3 years ago

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

麻烦弄好能公布一个mAP吗？谢谢了。我debug找出的原因是在loc2box的函数中，在计算dw，dh时候exp溢出，进而导致RPN生成的128候选框的坐标变成nan。然后网络将这些候选框给裁掉之后导致不够128个。最终batch变成了0.报的错。然后，loss变成nan，梯度爆炸。

好的呢

hippoula commented 3 years ago

你好请问这个问题解决了吗

xlhuang132 commented 3 years ago

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了