bubbliiiing / faster-rcnn-pytorch

这是一个faster-rcnn的pytorch实现的库,可以利用voc数据集格式的数据进行训练。
MIT License
1.59k stars 356 forks source link

when run predict.py #23

Open 9711128 opened 3 years ago

9711128 commented 3 years ago
微信图片_20201205193226
bubbliiiing commented 3 years ago

有没有文字描述呀,我看不了这个图

9711128 commented 3 years ago

TypeError: expected seqence object with len>=0 or a single integer

jinweiLiu commented 3 years ago

TypeError: expected seqence object with len>=0 or a single integer 同样这个问题

bubbliiiing commented 3 years ago

我赌五毛版本问题

jinweiLiu commented 3 years ago

you win

bubbliiiing commented 3 years ago

no I lose

sunjiabin17 commented 3 years ago

you win

我遇到同样的问题,请问您怎么解决的,我把torch升级到1.2.0还是不行

sunjiabin17 commented 3 years ago

you win

我遇到同样的问题,请问您怎么解决的,我把torch升级到1.2.0还是不行

https://github.com/open-mmlab/mmdetection/issues/2842 在frcnn.py中 def detect_image(self, image): with torch.no_grad():

添加以下两行

        if isinstance(self.model, torch.nn.DataParallel):
            self.model.device_ids = [0]
bubbliiiing commented 3 years ago

啊这,是什么东西

algo-scope commented 3 years ago

在rpn.forward里面,roi返回之前先转到cpu上了,从tensor变成了ndarray,所以dataparallel处理不了了,参考https://discuss.pytorch.org/t/nn-dataparallel-typeerror-expected-sequence-object-with-len-0-or-a-single-integer/97082/23

Yes. Sorry, in this line I put tensor to cpu before gather. return torch.unsqueeze(loss, 0), predicted_interaction.cpu().detach().view(-1, 1), correct_interaction.cpu().detach().view(-1, 1)

bubbliiiing commented 3 years ago

啥意思啊,我为什么没听懂…要是哪段代码有问题,我还得改呢,我这里运行没报错,我不知道是啥问题

algo-scope commented 3 years ago

https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/frcnn.py#L118 https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/nets/frcnn.py#L66

def forward(self, x, scale=1.):
        img_size = x.shape[2:]
        h = self.extractor(x)

        rpn_locs, rpn_scores, rois, roi_indices, anchor = \
            self.rpn.forward(h, img_size, scale)

        # print(np.shape(h))
        # print(np.shape(rois))
        # print(roi_indices)
        roi_cls_locs, roi_scores = self.head.forward(h, rois, roi_indices)
        return roi_cls_locs, roi_scores, rois, roi_indices

最后的四个返回值,后两个是ndarray的,不是tensor,按论坛里的说法,Dataparallel多卡分配计算完要合并结果,ndarray合并不了,你在rpn里面把roi放到CPU上了,所以这样。 https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/nets/rpn.py#L118

for i in range(n):
            roi = self.proposal_layer(
                rpn_locs[i].cpu().data.numpy(),
                rpn_fg_scores[i].cpu().data.numpy(),
                anchor, img_size,
                scale=scale)
            batch_index = i * np.ones((len(roi),), dtype=np.int32)
            rois.append(roi)
            roi_indices.append(batch_index)

我怀疑是版本问题,因为你代码里推理的时候把环境变量设置成了1张卡,可能你的版本没问题,别人的版本Dataparallel还是按照多卡的机制gather的,就失败了 其实推理阶段直接把Dataparallel删了就行