9711128 commented 3 years ago

bubbliiiing commented 3 years ago

有没有文字描述呀，我看不了这个图

9711128 commented 3 years ago

TypeError: expected seqence object with len>=0 or a single integer

jinweiLiu commented 3 years ago

TypeError: expected seqence object with len>=0 or a single integer 同样这个问题

bubbliiiing commented 3 years ago

我赌五毛版本问题

jinweiLiu commented 3 years ago

you win

bubbliiiing commented 3 years ago

no I lose

sunjiabin17 commented 3 years ago

you win

我遇到同样的问题，请问您怎么解决的，我把torch升级到1.2.0还是不行

sunjiabin17 commented 3 years ago

you win

我遇到同样的问题，请问您怎么解决的，我把torch升级到1.2.0还是不行

https://github.com/open-mmlab/mmdetection/issues/2842 在frcnn.py中 def detect_image(self, image): with torch.no_grad():

添加以下两行

        if isinstance(self.model, torch.nn.DataParallel):
            self.model.device_ids = [0]

bubbliiiing commented 3 years ago

啊这，是什么东西

algo-scope commented 3 years ago

在rpn.forward里面，roi返回之前先转到cpu上了，从tensor变成了ndarray，所以dataparallel处理不了了，参考https://discuss.pytorch.org/t/nn-dataparallel-typeerror-expected-sequence-object-with-len-0-or-a-single-integer/97082/23

Yes. Sorry, in this line I put tensor to cpu before gather. return torch.unsqueeze(loss, 0), predicted_interaction.cpu().detach().view(-1, 1), correct_interaction.cpu().detach().view(-1, 1)

bubbliiiing commented 3 years ago

啥意思啊，我为什么没听懂…要是哪段代码有问题，我还得改呢，我这里运行没报错，我不知道是啥问题

algo-scope commented 3 years ago

https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/frcnn.py#L118 https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/nets/frcnn.py#L66

def forward(self, x, scale=1.):
        img_size = x.shape[2:]
        h = self.extractor(x)

        rpn_locs, rpn_scores, rois, roi_indices, anchor = \
            self.rpn.forward(h, img_size, scale)

        # print(np.shape(h))
        # print(np.shape(rois))
        # print(roi_indices)
        roi_cls_locs, roi_scores = self.head.forward(h, rois, roi_indices)
        return roi_cls_locs, roi_scores, rois, roi_indices

最后的四个返回值，后两个是ndarray的，不是tensor，按论坛里的说法，Dataparallel多卡分配计算完要合并结果，ndarray合并不了，你在rpn里面把roi放到CPU上了，所以这样。 https://github.com/bubbliiiing/faster-rcnn-pytorch/blob/ef53d380c71c3cc30b35ca1474b601c1d1f33574/nets/rpn.py#L118

for i in range(n):
            roi = self.proposal_layer(
                rpn_locs[i].cpu().data.numpy(),
                rpn_fg_scores[i].cpu().data.numpy(),
                anchor, img_size,
                scale=scale)
            batch_index = i * np.ones((len(roi),), dtype=np.int32)
            rois.append(roi)
            roi_indices.append(batch_index)

我怀疑是版本问题，因为你代码里推理的时候把环境变量设置成了1张卡，可能你的版本没问题，别人的版本Dataparallel还是按照多卡的机制gather的，就失败了其实推理阶段直接把Dataparallel删了就行

bubbliiiing / faster-rcnn-pytorch

when run predict.py #23

添加以下两行