Hope1337 / YOWOv3

40 stars 6 forks source link

RuntimeError #16

Open T-wow opened 1 month ago

T-wow commented 1 month ago

I did not make any modifications to the data processing part. When I ran the main() program for training, I encountered the following error: Traceback (most recent call last): File "main.py", line 16, in train.train_model(config=config) File "/mnt/C/znmd_tjs/YOWOv3-main/scripts/train.py", line 112, in train_model loss = criterion(outputs, targets) / acc_grad File "/mnt/C/znmd_tjs/YOWOv3-main/utils/loss.py", line 76, in call gt[j, :n] = targets[matches, 1:] RuntimeError: The expanded size of the tensor (228) must match the existing size (28) at non-singleton dimension 1. Target sizes: [1, 228]. Tensor sizes: [28] Could you please advise how to resolve this issue? Thank you for your help.

Hope1337 commented 1 month ago

@T-wow please give more context

T-wow commented 1 month ago

@Hope1337 Traceback (most recent call last): File "main.py", line 19, in train.train_model(config=config) File "/mnt/C/znmd_tjs/YOWOv3-main/scripts/train.py", line 112, in train_model loss = criterion(outputs, targets) / acc_grad File "/mnt/C/znmd_tjs/YOWOv3-main/utils/loss.py", line 76, in call gt[j, :n] = targets[matches, 1:] RuntimeError: The expanded size of the tensor (228) must match the existing size (28) at non-singleton dimension 1. Target sizes: [2, 228]. Tensor sizes: [2, 28] and the parts of loss as flow: def call(self, outputs, targets):

    # [B, 4 * n_dfl_channel + num_classes, 14, 14]  (28, 14, 7)
    x = outputs[1] if isinstance(outputs, tuple) else outputs

    # [B, 4 * n_dfl_channel, 1029]
    output = torch.cat([i.view(x[0].shape[0], self.no, -1) for i in x], 2)

    # [B, 4 * n_dfl_channel, 1029], [B, num_classes, 1029]
    pred_output, pred_scores = output.split((4 * self.dfl_ch, self.nc), 1)

    # [B, 1029, 4 * n_dfl_channel]
    pred_output = pred_output.permute(0, 2, 1).contiguous()

    # [B, 1029, num_classes]
    pred_scores = pred_scores.permute(0, 2, 1).contiguous()
    nclass = pred_scores.shape[2]

    size = torch.tensor(x[0].shape[2:], dtype=pred_scores.dtype, device=self.device)
    size = size * self.stride[0]

    anchor_points, stride_tensor = make_anchors(x, self.stride, 0.5)

    # targets
    if targets.shape[0] == 0:
        gt = torch.zeros(pred_scores.shape[0], 0, 4 + nclass, device=self.device)
    else:
        i = targets[:, 0]  # image index
        _, counts = i.unique(return_counts=True)
        gt = torch.zeros(pred_scores.shape[0], counts.max(), 4 +  nclass, device=self.device)
        for j in range(pred_scores.shape[0]):
            matches = i == j
            n = matches.sum()
            if n:
                gt[j, :n] = targets[matches, 1:]
        #gt[..., 1:5] = wh2xy(gt[..., 1:5].mul_(size[[1, 0, 1, 0]]))
        gt[..., 0:4] = gt[..., 0:4].mul_(size[[1, 0, 1, 0]])

    gt_bboxes, gt_labels = gt.split((4, nclass), 2)  # cls, xyxy

    mask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0)

    # boxes
    # [B, 1029, 4 * n_dfl_channel]
    b, a, c = pred_output.shape
    pred_bboxes = pred_output.view(b, a, 4, c // 4).softmax(3)

    # [B, 1029, 4] -> after decode
    pred_bboxes = pred_bboxes.matmul(self.project.type(pred_bboxes.dtype))

    a, b = torch.split(pred_bboxes, 2, -1)

    # [B, 1029, 4] 
    pred_bboxes = torch.cat((anchor_points - a, anchor_points + b), -1)

    # [B, 1029, num_classes] 
    scores = pred_scores.detach().sigmoid()

    # [B, 1029, 4] 
    bboxes = (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype)
    target_bboxes, target_scores, fg_mask = self.assign(scores, bboxes,
                                                        gt_labels, gt_bboxes, mask_gt,
                                                        anchor_points * stride_tensor, stride_tensor)

    mask = target_scores.gt(0)[fg_mask]

    target_bboxes /= stride_tensor
    target_scores_sum = target_scores.sum()
    #num_pos = fg_mask.sum()

    # cls loss
    loss_cls = self.bce(pred_scores, target_scores.to(pred_scores.dtype))
    loss_cls = loss_cls.sum() / target_scores_sum
    #loss_cls = loss_cls.sum() / num_pos

    # box loss
    loss_box = torch.zeros(1, device=self.device)
    loss_dfl = torch.zeros(1, device=self.device)
    if fg_mask.sum():
        # IoU loss
        weight = torch.masked_select(target_scores.sum(-1), fg_mask).unsqueeze(-1)
        loss_box = self.iou(pred_bboxes[fg_mask], target_bboxes[fg_mask])
        loss_box = ((1.0 - loss_box) * weight).sum() / target_scores_sum
        #loss_box = (1.0 - loss_box).sum() / num_pos
        # DFL loss
        a, b = torch.split(target_bboxes, 2, -1)
        target_lt_rb = torch.cat((anchor_points - a, b - anchor_points), -1)
        target_lt_rb = target_lt_rb.clamp(0, self.dfl_ch - 1.01)  # distance (left_top, right_bottom)
        loss_dfl = self.df_loss(pred_output[fg_mask].view(-1, self.dfl_ch), target_lt_rb[fg_mask])
        loss_dfl = (loss_dfl * weight).sum() / target_scores_sum
        #loss_dfl = loss_dfl.sum() / num_pos

    loss_cls *= self.scale_cls_loss
    loss_box *= self.scale_box_loss
    loss_dfl *= self.scale_dfl_loss

    #print("cls : {}, box : {}, dfl : {}".format(loss_cls.item(), loss_box.item(), loss_dfl.item()))
    return loss_cls + loss_box + loss_dfl  # loss(cls, box, dfl)

Thank you very much

T-wow commented 1 month ago

@Hope1337 when I use "python main.py --mode train --config config/ucf_config.yaml" to train a model on UCF101-24

Hope1337 commented 1 month ago

@T-wow sorry, I have a typo when coding with vim. Specially in the ucf_config file, num_classes should be 24 not 224

T-wow commented 1 month ago

@Hope1337 Thank you for your response. With your help, I successfully ran the program. However, I noticed that the evaluation metric you provided seems to only include Frame-mAP. Could you add Video-mAP, similar to YOWO V2? I hope you will consider my suggestion