Sense-X / TSD

1st place models in Google OpenImage Detection Challenge 2019
Apache License 2.0
456 stars 64 forks source link

Some questions about TSD_cls and TSD_bbox #20

Closed wangjue-wzq closed 4 years ago

wangjue-wzq commented 4 years ago

There are two branches in the framework, one is TSD, and the other is sibling head. On my own data set, TSD performance is better, but the sibling are not good. What are the possible reasons for this? In addition, delta_c should act on TSD_cls, but I did not find the corresponding code.

songguanglu commented 4 years ago

Hi, you can compare the performance of sibling head in TSD and the sibling head in baseline. If the performance of them is similar, it means the traditional sibling head in your own dataset can't perform well. The code for delta_c can be found in line 246 in tsd_bbox_head.py.

wangjue-wzq commented 4 years ago

Thank you for your reply! The TSD is similar with baseline, but the sibling head is worse than baseline.

songguanglu commented 4 years ago

Can you provide the details about your own dataset and training config? The detailed performance of TSD and baseline can help us analyze this result.

wangjue-wzq commented 4 years ago

In the training process, the TSD module can be basically fitted, and the classification accuracy during training reaches 99%, but the sibling head is only 84%. Because there are many background samples, 84% means a lot of classification errors. Training is to rotate the target. In the TSDSharedFCBBoxHead module, I added an angle prediction [x,y,w,h,theta], using the loss function Smooth L1; in the SharedFCBBoxHeadRbbox module, it also predicts four coordinates and one angle, loss The function is Smooth L1. I guess it may be a problem with my use of the TSDSharedFCBBoxHead module. The SharedFCBBoxHeadRbbox module has been tested many times and has no errors on other models. Thank you very much, here are some necessary codes and configuration files.

config


# model settings
model = dict(
    type='TSDRoITransformer',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[4],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=dict(
        type='TSDSharedFCBBoxHead',      #the output is [x,y,w,h,theta] the theta is the rotation angle of object,loss is smooth L1
        featmap_strides=[4, 8, 16, 32],
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=11,
        cls_pc_margin=0.2,
        loc_pc_margin=0.2,
        target_means=[0., 0., 0., 0.,0.],
        target_stds=[0.1, 0.1, 0.2, 0.2,0.1],
        reg_class_agnostic=False,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
    rbbox_roi_extractor=dict(
        type='RboxSingleRoIExtractor',
        roi_layer=dict(type='RoIAlignRotated', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    rbbox_head = dict(
        type='SharedFCBBoxHeadRbbox', #the output is [x,y,w,h,theta] the theta is the rotation angle of object,loss is smooth L1
        num_fcs=2,
        num_cls_fcs=0,  #add fc to classification
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=11,
        target_means=[0., 0., 0., 0., 0.],
        target_stds=[0.05, 0.05, 0.1, 0.1, 0.05],
        reg_class_agnostic=False,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
)
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=30,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=[dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssignerRbbox',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomRbboxSampler',
                num=512, # 512
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ])
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr = 0.05, nms_top=False, nms = dict(type='py_cpu_nms_poly_fast', iou_thr=0.1), max_per_img = 2000)
        # score_thr=0.00, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
)
# dataset settings
dataset_type = 'PlaneDataset'
data_root = 'data/Plane/'
img_norm_cfg = dict(
    mean=[112.209, 114.267, 97.302], std=[58.826, 50.785, 49.986], to_rgb=True)
    # mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'plane_aug_train.json',
        img_prefix=data_root + 'train_aug/images',
        img_scale=(1024, 1024),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'plane_aug_train.json',
        img_prefix=data_root + 'train_aug/images',
        img_scale=(1024, 1024),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=True),
    test=dict(
        type=dataset_type,
        # ann_file=data_root + 'plane_aug_train.json',
        # img_prefix=data_root + 'train_aug/images',
        ann_file=data_root + 'plane_train.json',
        img_prefix=data_root + 'train/images',
        img_scale=(1024, 1024),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[40, 50])
checkpoint_config = dict(interval=2)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# runtime settings
total_epochs = 50
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/TSD_faster_rcnn_r50_fpn_1x_plane_small'
load_from = None
resume_from = None
workflow = [('train', 1)]

detector

@DETECTORS.register_module
class TSDRoITransformer(BaseDetectorNew, RPNTestMixin):

    def __init__(self,
                 backbone,
                 neck=None,
                 shared_head=None,
                 shared_head_rbbox=None,
                 rpn_head=None,
                 bbox_roi_extractor=None,
                 bbox_head=None,
                 rbbox_roi_extractor=None,
                 rbbox_head=None,
                 mask_roi_extractor=None,
                 mask_head=None,
                 train_cfg=None,
                 test_cfg=None,
                 pretrained=None):
        assert bbox_roi_extractor is not None
        assert bbox_head is not None

        assert rbbox_roi_extractor is not None
        assert rbbox_head is not None
        super(TSDRoITransformer, self).__init__()

        self.backbone = builder.build_backbone(backbone)

        if neck is not None:
            self.neck = builder.build_neck(neck)

        if rpn_head is not None:
            self.rpn_head = builder.build_head(rpn_head)

        if shared_head is not None:
            self.shared_head = builder.build_shared_head(shared_head)

        if shared_head_rbbox is not None:
            self.shared_head_rbbox = builder.build_shared_head(shared_head_rbbox)

        if bbox_head is not None:
            self.bbox_roi_extractor = builder.build_roi_extractor(
                bbox_roi_extractor)
            self.bbox_head = builder.build_head(bbox_head)
            self.use_TSD = 'TSD' in bbox_head['type']
        # import pdb
        # pdb.set_trace()
        if rbbox_head is not None:
            self.rbbox_roi_extractor = builder.build_roi_extractor(
                rbbox_roi_extractor)
            self.rbbox_head = builder.build_head(rbbox_head)

        if mask_head is not None:
            if mask_roi_extractor is not None:
                self.mask_roi_extractor = builder.build_roi_extractor(
                    mask_roi_extractor)
                self.share_roi_extractor = False
            else:
                self.share_roi_extractor = True
                self.mask_roi_extractor = self.rbbox_roi_extractor
            self.mask_head = builder.build_head(mask_head)

        self.train_cfg = train_cfg
        self.test_cfg = test_cfg

        self.init_weights(pretrained=pretrained)

    @property
    def with_rpn(self):
        return hasattr(self, 'rpn_head') and self.rpn_head is not None

    def init_weights(self, pretrained=None):
        super(TSDRoITransformer, self).init_weights(pretrained)
        self.backbone.init_weights(pretrained=pretrained)
        if self.with_neck:
            if isinstance(self.neck, nn.Sequential):
                for m in self.neck:
                    m.init_weights()
            else:
                self.neck.init_weights()
        if self.with_rpn:
            self.rpn_head.init_weights()
        if self.with_shared_head:
            self.shared_head.init_weights(pretrained=pretrained)
        if self.with_shared_head_rbbox:
            self.shared_head_rbbox.init_weights(pretrained=pretrained)
        if self.with_bbox:
            self.bbox_roi_extractor.init_weights()
            self.bbox_head.init_weights()
        if self.with_rbbox:
            self.rbbox_roi_extractor.init_weights()
            self.rbbox_head.init_weights()
        if self.with_mask:
            self.mask_head.init_weights()
            if not self.share_roi_extractor:
                self.mask_roi_extractor.init_weights()

    def extract_feat(self, img):
        x = self.backbone(img)
        if self.with_neck:
            x = self.neck(x)
        return x

    def forward_train(self,
                      img,
                      img_meta,
                      gt_bboxes, #[n,4] [x,y,h,w]
                      gt_labels,
                      gt_bboxes_ignore=None,
                      gt_masks=None,
                      proposals=None):
        x = self.extract_feat(img) #resnet 5 layers
        losses = dict()
        # trans gt_masks[1024, 1024] to gt_obbs
        # [cx, cy, w, h, theta]
        gt_obbs = gt_mask_bp_obbs_list(gt_masks)
        # RPN forward and loss
        if self.with_rpn:
            rpn_outs = self.rpn_head(x)
            rpn_loss_inputs = rpn_outs + (gt_bboxes, img_meta,
                                          self.train_cfg.rpn)
            rpn_losses = self.rpn_head.loss(
                *rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
            losses.update(rpn_losses)
            proposal_cfg = self.train_cfg.get('rpn_proposal',
                                              self.test_cfg.rpn)
            proposal_inputs = rpn_outs + (img_meta, proposal_cfg)
            proposal_list = self.rpn_head.get_bboxes(*proposal_inputs)
        else:
            proposal_list = proposals
        # assign gts and sample proposals (hbb assign)
        if self.with_bbox or self.with_mask:
            bbox_assigner = build_assigner(self.train_cfg.rcnn[0].assigner)
            bbox_sampler = build_sampler(
                self.train_cfg.rcnn[0].sampler, context=self)
            num_imgs = img.size(0)
            if gt_bboxes_ignore is None:
                gt_bboxes_ignore = [None for _ in range(num_imgs)]
            sampling_results = []
            for i in range(num_imgs):
                # RPN positive negative
                assign_result = bbox_assigner.assign(proposal_list[i],
                                                     gt_bboxes[i],
                                                     gt_bboxes_ignore[i],
                                                     gt_labels[i])
                # positive negative smaple
                sampling_result = bbox_sampler.sample(
                    assign_result,
                    proposal_list[i],
                    gt_bboxes[i],
                    gt_labels[i],
                    feats=[lvl_feat[i][None] for lvl_feat in x])
                sampling_results.append(sampling_result)
        # bbox head forward and loss
        # horizonal bbox
        if self.with_bbox:
            rois = bbox2roi([res.bboxes for res in sampling_results])
            # TODO: a more flexible way to decide which feature maps to use
            bbox_feats = self.bbox_roi_extractor(
                x[:self.bbox_roi_extractor.num_inputs], rois)
            if self.with_shared_head:
                bbox_feats = self.shared_head(bbox_feats)
            # cls_score 512*11
            # bbox_pred 512*55
            cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(
                bbox_feats, x[:self.bbox_roi_extractor.num_inputs],rois)
            # return: labels, label_weights, bbox_targets, bbox_weights, TSD_labels, TSD_label_weights,
            # TSD_bbox_targets, TSD_bbox_weights, pc_cls_loss, pc_loc_loss
            rbbox_targets = self.bbox_head.get_target(rois, sampling_results,gt_masks,
                                                     gt_bboxes, gt_labels, delta_c, delta_r, cls_score, bbox_pred,
                                                     TSD_cls_score, TSD_bbox_pred,
                                                     self.train_cfg.rcnn[0], img_meta)
            loss_bbox = self.bbox_head.loss(cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred,
                                            *rbbox_targets)
            losses.update(loss_bbox)
        pos_is_gts = [res.pos_is_gt for res in sampling_results]
        # roi_labels = rbbox_targets[0]
        roi_labels = rbbox_targets[0]
        tsd_roi = rbbox_targets[-3]
        with torch.no_grad():
            # import pdb
            # pdb.set_trace()
            rotated_proposal_list = self.bbox_head.refine_rbboxes(
                roi2droi(tsd_roi), roi_labels, TSD_bbox_pred, pos_is_gts, img_meta
            )
        # assign gts and sample proposals (rbb assign)
        # orient bbox
        if self.with_rbbox:
            bbox_assigner = build_assigner(self.train_cfg.rcnn[1].assigner)
            bbox_sampler = build_sampler(
                self.train_cfg.rcnn[1].sampler, context=self)
            num_imgs = img.size(0)
            if gt_bboxes_ignore is None:
                gt_bboxes_ignore = [None for _ in range(num_imgs)]
            sampling_results = []
            for i in range(num_imgs):
                gt_obbs_best_roi = choose_best_Rroi_batch(gt_obbs[i])
                assign_result = bbox_assigner.assign(
                    rotated_proposal_list[i], gt_obbs_best_roi, gt_bboxes_ignore[i],
                    gt_labels[i])
                sampling_result = bbox_sampler.sample(
                    assign_result,
                    rotated_proposal_list[i],
                    torch.from_numpy(gt_obbs_best_roi).float().to(rotated_proposal_list[i].device),
                    gt_labels[i],
                    feats=[lvl_feat[i][None] for lvl_feat in x])
                sampling_results.append(sampling_result)
        if self.with_rbbox:
            # (batch_ind, x_ctr, y_ctr, w, h, angle)
            rrois = dbbox2roi([res.bboxes for res in sampling_results])
            # feat enlarge
            # rrois[:, 3] = rrois[:, 3] * 1.2
            # rrois[:, 4] = rrois[:, 4] * 1.4
            rrois[:, 3] = rrois[:, 3] * self.rbbox_roi_extractor.w_enlarge
            rrois[:, 4] = rrois[:, 4] * self.rbbox_roi_extractor.h_enlarge
            rbbox_feats = self.rbbox_roi_extractor(x[:self.rbbox_roi_extractor.num_inputs],
                                                   rrois)
            if self.with_shared_head_rbbox:
                rbbox_feats = self.shared_head_rbbox(rbbox_feats)
            cls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
            # SharedFCBBoxHeadRbbox
            rbbox_targets = self.rbbox_head.get_target_rbbox(sampling_results, gt_obbs,
                                                        gt_labels, self.train_cfg.rcnn[1])
            loss_rbbox = self.rbbox_head.loss(cls_score, rbbox_pred, *rbbox_targets)
            for name, value in loss_rbbox.items():
                losses['s{}.{}'.format(1, name)] = (value)
        return losses

    def simple_test(self, img, img_meta, proposals=None, rescale=False):
        x = self.extract_feat(img)
        proposal_list = self.simple_test_rpn(
            x, img_meta, self.test_cfg.rpn) if proposals is None else proposals

        img_shape = img_meta[0]['img_shape']
        scale_factor = img_meta[0]['scale_factor']

        rois = bbox2roi(proposal_list)
        roi_feats = self.bbox_roi_extractor(
            x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
        if self.with_shared_head:
            roi_feats = self.shared_head(roi_feats)
        cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(
            roi_feats, x[:self.bbox_roi_extractor.num_inputs],rois)

        w = rois[:, 3] - rois[:, 1] + 1
        h = rois[:, 4] - rois[:, 2] + 1
        scale = 0.1
        rois_r = rois.new_zeros(rois.shape[0], rois.shape[1])
        rois_r[:, 0] = rois[:, 0]
        delta_r = delta_r.to(dtype=rois_r.dtype)
        rois_r[:, 1] = rois[:, 1] + delta_r[:, 0] * scale * w
        rois_r[:, 2] = rois[:, 2] + delta_r[:, 1] * scale * h
        rois_r[:, 3] = rois[:, 3] + delta_r[:, 0] * scale * w
        rois_r[:, 4] = rois[:, 4] + delta_r[:, 1] * scale * h

        rcnn_test_cfg = self.test_cfg.rcnn
        bbox_label = TSD_cls_score.argmax(dim=1)
        rrois = self.bbox_head.regress_by_class_rbbox(roi2droi(rois_r), bbox_label, TSD_bbox_pred,
                                                      img_meta[0])
        rrois_enlarge = copy.deepcopy(rrois)
        rrois_enlarge[:, 3] = rrois_enlarge[:, 3] * self.rbbox_roi_extractor.w_enlarge
        rrois_enlarge[:, 4] = rrois_enlarge[:, 4] * self.rbbox_roi_extractor.h_enlarge
        rbbox_feats = self.rbbox_roi_extractor(
            x[:len(self.rbbox_roi_extractor.featmap_strides)], rrois_enlarge)
        if self.with_shared_head_rbbox:
            rbbox_feats = self.shared_head_rbbox(rbbox_feats)

        rcls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
        det_rbboxes, det_labels = self.rbbox_head.get_det_rbboxes(
            rrois,
            rcls_score,
            rbbox_pred,
            img_shape,
            scale_factor,
            rescale=rescale,
            cfg=rcnn_test_cfg)
        rbbox_results = dbbox2result(det_rbboxes, det_labels,
                                     self.rbbox_head.num_classes)
        return rbbox_results

    def tsd_simple_test_bboxes(self,
                           x,
                           img_metas,
                           proposals,
                           rcnn_test_cfg,
                           rescale=False):
        """Test only det bboxes without augmentation."""
        rois = bbox2roi(proposals)
        roi_feats = self.bbox_roi_extractor(
            x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
        if self.with_shared_head:
            roi_feats = self.shared_head(roi_feats)
        cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(roi_feats, x[:self.bbox_roi_extractor.num_inputs], rois)
        img_shape = img_metas[0]['img_shape']
        scale_factor = img_metas[0]['scale_factor']

        w = rois[:,3]-rois[:,1]+1
        h = rois[:,4]-rois[:,2]+1
        scale = 0.1
        rois_r = rois.new_zeros(rois.shape[0],rois.shape[1])
        rois_r[:,0] = rois[:,0]
        delta_r = delta_r.to(dtype=rois_r.dtype)
        rois_r[:,1] = rois[:,1]+delta_r[:,0]*scale*w
        rois_r[:,2] = rois[:,2]+delta_r[:,1]*scale*h
        rois_r[:,3] = rois[:,3]+delta_r[:,0]*scale*w
        rois_r[:,4] = rois[:,4]+delta_r[:,1]*scale*h

        det_bboxes, det_labels = self.bbox_head.get_det_bboxes(
            rois_r,
            TSD_cls_score,
            TSD_bbox_pred,
            img_shape,
            scale_factor,
            rescale=rescale,
            cfg=rcnn_test_cfg)
        return det_bboxes, det_labels

    def aug_test(self, imgs, img_metas, proposals=None, rescale=None):
        # raise NotImplementedError
        # import pdb; pdb.set_trace()
        proposal_list = self.aug_test_rpn_rotate(
            self.extract_feats(imgs), img_metas, self.test_cfg.rpn)

        rcnn_test_cfg = self.test_cfg.rcnn

        aug_rbboxes = []
        aug_rscores = []
        for x, img_meta in zip(self.extract_feats(imgs), img_metas):
            # only one image in the batch
            img_shape = img_meta[0]['img_shape']
            scale_factor = img_meta[0]['scale_factor']
            flip = img_meta[0]['flip']

            proposals = bbox_mapping(proposal_list[0][:, :4], img_shape,
                                     scale_factor, flip)

            angle = img_meta[0]['angle']
            # print('img shape: ', img_shape)
            if angle != 0:
                try:

                    proposals = bbox_rotate_mapping(proposal_list[0][:, :4], img_shape,
                                                angle)
                except:
                    import pdb; pdb.set_trace()
            rois = bbox2roi([proposals])
            # recompute feature maps to save GPU memory
            roi_feats = self.bbox_roi_extractor(
                x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
            if self.with_shared_head:
                roi_feats = self.shared_head(roi_feats)
            cls_score, bbox_pred = self.bbox_head(roi_feats)

            bbox_label = cls_score.argmax(dim=1)
            rrois = self.bbox_head.regress_by_class_rbbox(roi2droi(rois), bbox_label,
                                                          bbox_pred,
                                                          img_meta[0])

            rrois_enlarge = copy.deepcopy(rrois)
            rrois_enlarge[:, 3] = rrois_enlarge[:, 3] * self.rbbox_roi_extractor.w_enlarge
            rrois_enlarge[:, 4] = rrois_enlarge[:, 4] * self.rbbox_roi_extractor.h_enlarge
            rbbox_feats = self.rbbox_roi_extractor(
                x[:len(self.rbbox_roi_extractor.featmap_strides)], rrois_enlarge)
            if self.with_shared_head_rbbox:
                rbbox_feats = self.shared_head_rbbox(rbbox_feats)

            rcls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
            rbboxes, rscores = self.rbbox_head.get_det_rbboxes(
                rrois,
                rcls_score,
                rbbox_pred,
                img_shape,
                scale_factor,
                rescale=rescale,
                cfg=None)
            aug_rbboxes.append(rbboxes)
            aug_rscores.append(rscores)

        merged_rbboxes, merged_rscores = merge_rotate_aug_bboxes(
            aug_rbboxes, aug_rscores, img_metas, rcnn_test_cfg
        )
        det_rbboxes, det_rlabels = multiclass_nms_rbbox(
                                merged_rbboxes, merged_rscores, rcnn_test_cfg.score_thr,
                                rcnn_test_cfg.nms, rcnn_test_cfg.max_per_img)

        if rescale:
            _det_rbboxes = det_rbboxes
        else:
            _det_rbboxes = det_rbboxes.clone()
            _det_rbboxes[:, :4] *= img_metas[0][0]['scale_factor']

        rbbox_results = dbbox2result(_det_rbboxes, det_rlabels,
                                     self.rbbox_head.num_classes)
        return rbbox_results

TSD

@HEADS.register_module
class TSDConvFCBBoxHead(BBoxHead,BBoxHeadRbbox):
    r"""More general bbox head, with shared conv and fc layers and two optional
    separated branches.
    def __init__(self,
                 num_shared_convs=0,
                 num_shared_fcs=0,
                 num_cls_convs=0,
                 num_cls_fcs=0,
                 num_reg_convs=0,
                 num_reg_fcs=0,
                 conv_out_channels=256,
                 fc_out_channels=1024,
                 conv_cfg=None,
                 norm_cfg=None,
                 cls_pc_margin=0.2,
                 loc_pc_margin=0.2,
                 featmap_strides=None,
                 *args,
                 **kwargs):
        super(TSDConvFCBBoxHead, self).__init__(*args, **kwargs)
        assert (num_shared_convs + num_shared_fcs + num_cls_convs +
                num_cls_fcs + num_reg_convs + num_reg_fcs > 0)
        if num_cls_convs > 0 or num_reg_convs > 0:
            assert num_shared_fcs == 0
        if not self.with_cls:
            assert num_cls_convs == 0 and num_cls_fcs == 0
        if not self.with_reg:
            assert num_reg_convs == 0 and num_reg_fcs == 0
        self.num_shared_convs = num_shared_convs
        self.num_shared_fcs = num_shared_fcs
        self.num_cls_convs = num_cls_convs
        self.num_cls_fcs = num_cls_fcs
        self.num_reg_convs = num_reg_convs
        self.num_reg_fcs = num_reg_fcs
        self.conv_out_channels = conv_out_channels
        self.fc_out_channels = fc_out_channels
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.cls_pc_margin = cls_pc_margin
        self.loc_pc_margin = loc_pc_margin
        # add shared fc and specific fcs to generate delta_c and delta_r for disentangling input proposals
        self.shared_fc = nn.Sequential(
            nn.Linear(self.roi_feat_area * self.in_channels, 256),
            nn.ReLU(inplace=True))
        self.delta_c = nn.Sequential(
            nn.Linear(256, 256),
            nn.ReLU(inplace=True),
            nn.Linear(256, self.roi_feat_area * 2))
        self.delta_r = nn.Sequential(
            nn.Linear(256, 256),
            nn.ReLU(inplace=True),
            nn.Linear(256, 2))
        # add AplignPool for Pc and Pr
        self.pool_size = int(np.sqrt(self.roi_feat_area))
        self.align_pooling_pc = nn.ModuleList([DeltaCPooling(spatial_scale=1.0 / x,
                                                             out_size=self.pool_size,
                                                             out_channels=self.in_channels,
                                                             no_trans=False,
                                                             group_size=1,
                                                             trans_std=0.1) for x in featmap_strides])
        self.align_pooling_pr = nn.ModuleList([DeltaRPooling(spatial_scale=1.0 / x,
                                                             out_size=self.pool_size,
                                                             out_channels=self.in_channels,
                                                             no_trans=False,
                                                             group_size=1,
                                                             trans_std=0.1) for x in featmap_strides])
        # add shared convs and fcs
        self.shared_convs, self.shared_fcs, last_layer_dim = \
            self._add_conv_fc_branch(
                self.num_shared_convs, self.num_shared_fcs, self.in_channels,
                True)
        self.shared_out_channels = last_layer_dim
        # add TSD convs and fcs
        self.TSD_pc_convs, self.TSD_pc_fcs, TSD_last_layer_dim = \
            self._add_conv_fc_branch(
                self.num_shared_convs, self.num_shared_fcs, self.in_channels, True)
        self.TSD_pr_convs, self.TSD_pr_fcs, TSD_last_layer_dim = \
            self._add_conv_fc_branch(
                self.num_shared_convs, self.num_shared_fcs, self.in_channels, True)
        self.TSD_out_channels = TSD_last_layer_dim
        # add cls specific branch
        self.cls_convs, self.cls_fcs, self.cls_last_dim = \
            self._add_conv_fc_branch(
                self.num_cls_convs, self.num_cls_fcs, self.shared_out_channels)
        # add TSD cls specific branch
        self.TSD_cls_convs, self.TSD_cls_fcs, self.TSD_cls_last_dim = \
            self._add_conv_fc_branch(
                self.num_cls_convs, self.num_cls_fcs, self.TSD_out_channels)
        # add reg specific branch
        self.reg_convs, self.reg_fcs, self.reg_last_dim = \
            self._add_conv_fc_branch(
                self.num_reg_convs, self.num_reg_fcs, self.shared_out_channels)
        # add TSD reg specific branch
        self.TSD_reg_convs, self.TSD_reg_fcs, self.TSD_reg_last_dim = \
            self._add_conv_fc_branch(
                self.num_reg_convs, self.num_reg_fcs, self.TSD_out_channels)
        if self.num_shared_fcs == 0 and not self.with_avg_pool:
            if self.num_cls_fcs == 0:
                self.cls_last_dim *= self.roi_feat_area
                self.TSD_cls_last_dim *= self.roi_feat_area
            if self.num_reg_fcs == 0:
                self.reg_last_dim *= self.roi_feat_area
                self.TSD_reg_last_dim *= self.roi_feat_area
        self.relu = nn.ReLU(inplace=True)
        # reconstruct fc_cls and fc_reg since input channels are changed
        if self.with_cls:
            self.fc_cls = nn.Linear(self.cls_last_dim, self.num_classes)
            self.TSD_fc_cls = nn.Linear(self.TSD_cls_last_dim, self.num_classes)
        if self.with_reg:
            out_dim_reg = (5 if self.reg_class_agnostic else 5 *
                                                             self.num_classes)
            # out_dim_reg_tsd = (4 if self.reg_class_agnostic else 4 *
            #                                                  self.num_classes)
            self.fc_reg = nn.Linear(self.reg_last_dim, out_dim_reg)
            # self.TSD_fc_reg = nn.Linear(self.TSD_reg_last_dim, out_dim_reg)
            self.TSD_fc_reg = nn.Linear(self.TSD_reg_last_dim, out_dim_reg)

    def _add_conv_fc_branch(self,
                            num_branch_convs,
                            num_branch_fcs,
                            in_channels,
                            is_shared=False):
        """Add shared or separable branch

        convs -> avg pool (optional) -> fcs
        """
        last_layer_dim = in_channels
        # add branch specific conv layers
        branch_convs = nn.ModuleList()
        if num_branch_convs > 0:
            for i in range(num_branch_convs):
                conv_in_channels = (
                    last_layer_dim if i == 0 else self.conv_out_channels)
                branch_convs.append(
                    ConvModule(
                        conv_in_channels,
                        self.conv_out_channels,
                        3,
                        padding=1,
                        conv_cfg=self.conv_cfg,
                        norm_cfg=self.norm_cfg))
            last_layer_dim = self.conv_out_channels
        # add branch specific fc layers
        branch_fcs = nn.ModuleList()
        if num_branch_fcs > 0:
            # for shared branch, only consider self.with_avg_pool
            # for separated branches, also consider self.num_shared_fcs
            if (is_shared
                or self.num_shared_fcs == 0) and not self.with_avg_pool:
                last_layer_dim *= self.roi_feat_area
            for i in range(num_branch_fcs):
                fc_in_channels = (
                    last_layer_dim if i == 0 else self.fc_out_channels)
                branch_fcs.append(
                    nn.Linear(fc_in_channels, self.fc_out_channels))
            last_layer_dim = self.fc_out_channels
        return branch_convs, branch_fcs, last_layer_dim

    def init_weights(self):
        super(TSDConvFCBBoxHead, self).init_weights()
        # conv layers are already initialized by ConvModule
        for module_list in [self.shared_fcs, self.cls_fcs, self.reg_fcs, self.TSD_pc_fcs, self.TSD_pr_fcs,
                            self.TSD_cls_fcs, self.TSD_reg_fcs]:
            for m in module_list.modules():
                if isinstance(m, nn.Linear):
                    # nn.init.xavier_uniform_(m.weight)
                    nn.init.kaiming_normal_(m.weight.data, a=1)
                    nn.init.constant_(m.bias, 0)

        for module_list in [self.shared_fc, self.delta_c, self.delta_r]:
            for m in module_list.modules():
                if isinstance(m, nn.BatchNorm2d):
                    m.weight.data.fill_(1)
                    m.bias.data.zero_()
                if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
                    nn.init.kaiming_normal_(m.weight.data, a=1)
                    if m.bias is not None:
                        m.bias.data.zero_()

    def map_roi_levels(self, rois, num_levels):
        """Map rois to corresponding feature levels by scales.

        - scale < finest_scale * 2: level 0
        - finest_scale * 2 <= scale < finest_scale * 4: level 1
        - finest_scale * 4 <= scale < finest_scale * 8: level 2
        - scale >= finest_scale * 8: level 3

        Args:
            rois (Tensor): Input RoIs, shape (k, 5).
            num_levels (int): Total level number.

        Returns:
            Tensor: Level index (0-based) of each RoI, shape (k, )
        """
        finest_scale = 56
        scale = torch.sqrt(
            (rois[:, 3] - rois[:, 1] + 1) * (rois[:, 4] - rois[:, 2] + 1))
        target_lvls = torch.floor(torch.log2(scale / finest_scale + 1e-6))
        target_lvls = target_lvls.clamp(min=0, max=num_levels - 1).long()
        return target_lvls

    @force_fp32(apply_to=('feats'))
    def forward(self, x, feats, rois):
        # generate TSD pc pr and corresponding features
        c = x.numel() // x.shape[0]
        x1 = x.view(-1, c) # n*12544
        x2 = self.shared_fc(x1)
        delta_c = self.delta_c(x2)
        delta_r = self.delta_r(x2)
        num_levels = len(feats) #number_levels = 4
        target_lvls = self.map_roi_levels(rois, num_levels)
        TSD_cls_feats = x.new_zeros(
            rois.size(0), self.in_channels, self.pool_size, self.pool_size)
        TSD_loc_feats = x.new_zeros(
            rois.size(0), self.in_channels, self.pool_size, self.pool_size)
        for i in range(num_levels): #number_levels = 4
            inds = target_lvls == i
            if inds.any():
                delta_c_ = delta_c[inds, :]
                delta_r_ = delta_r[inds, :]
                rois_ = rois[inds, :]
                tsd_feats_cls = self.align_pooling_pc[i](feats[i], rois_, delta_c_.to(dtype=rois_.dtype))
                tsd_feats_loc = self.align_pooling_pr[i](feats[i], rois_, delta_r_.to(dtype=rois_.dtype))
                TSD_cls_feats[inds] = tsd_feats_cls.to(dtype=x.dtype)
                TSD_loc_feats[inds] = tsd_feats_loc.to(dtype=x.dtype)

        # shared part for TSD
        if self.num_shared_convs > 0:
            for conv in self.TSD_pc_convs:
                TSD_cls_feats = conv(TSD_cls_feats)
            for conv in self.TSD_pr_convs:
                TSD_loc_feats = conv(TSD_loc_feats)

        if self.num_shared_fcs > 0:
            if self.with_avg_pool:
                TSD_cls_feats = self.avg_pool(TSD_cls_feats)
                TSD_loc_feats = self.avg_pool(TSD_loc_feats)

            TSD_cls_feats = TSD_cls_feats.flatten(1)
            TSD_loc_feats = TSD_loc_feats.flatten(1)

            for fc in self.TSD_pc_fcs:
                TSD_cls_feats = self.relu(fc(TSD_cls_feats))
            for fc in self.TSD_pr_fcs:
                TSD_loc_feats = self.relu(fc(TSD_loc_feats))
            # separate branches
        TSD_x_cls = TSD_cls_feats
        TSD_x_reg = TSD_loc_feats
        for conv in self.TSD_cls_convs:
            TSD_x_cls = conv(TSD_x_cls)
        if TSD_x_cls.dim() > 2:
            if self.with_avg_pool:
                TSD_x_cls = self.avg_pool(TSD_x_cls)
            TSD_x_cls = TSD_x_cls.flatten(1)
        for fc in self.TSD_cls_fcs:
            TSD_x_cls = self.relu(fc(TSD_x_cls))

        for conv in self.TSD_reg_convs:
            TSD_x_reg = conv(TSD_x_reg)
        if TSD_x_reg.dim() > 2:
            if self.with_avg_pool:
                TSD_x_reg = self.avg_pool(TSD_x_reg)
                TSD_x_reg = TSD_x_reg.flatten(1)
        for fc in self.TSD_reg_fcs:
            TSD_x_reg = self.relu(fc(TSD_x_reg))

        TSD_cls_score = self.TSD_fc_cls(TSD_x_cls) if self.with_cls else None
        TSD_bbox_pred = self.TSD_fc_reg(TSD_x_reg) if self.with_reg else None

        # shared part for sibling head, only used in training phase.
        if self.training:
            if self.num_shared_convs > 0:
                for conv in self.shared_convs:
                    x = conv(x)

            if self.num_shared_fcs > 0:
                if self.with_avg_pool:
                    x = self.avg_pool(x)

                x = x.flatten(1)

                for fc in self.shared_fcs:
                    x = self.relu(fc(x))
            # separate branches
            x_cls = x
            x_reg = x

            for conv in self.cls_convs:
                x_cls = conv(x_cls)
            if x_cls.dim() > 2:
                if self.with_avg_pool:
                    x_cls = self.avg_pool(x_cls)
                x_cls = x_cls.flatten(1)
            for fc in self.cls_fcs:
                x_cls = self.relu(fc(x_cls))

            for conv in self.reg_convs:
                x_reg = conv(x_reg)
            if x_reg.dim() > 2:
                if self.with_avg_pool:
                    x_reg = self.avg_pool(x_reg)
                x_reg = x_reg.flatten(1)
            for fc in self.reg_fcs:
                x_reg = self.relu(fc(x_reg))

            cls_score = self.fc_cls(x_cls) if self.with_cls else None
            bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
            return cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r
        else:
            return None, None, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r

    @force_fp32(apply_to=('delta_c', 'delta_r', 'TSD_cls_score', 'TSD_bbox_pred', 'cls_score', 'bbox_pred'))
    def get_target(self, rois, sampling_results, gt_masks, gt_bboxes, gt_labels, delta_c, delta_r, cls_score, bbox_pred,
                   TSD_cls_score, TSD_bbox_pred,rcnn_train_cfg, img_metas):
        pos_proposals = [res.pos_bboxes for res in sampling_results]
        neg_proposals = [res.neg_bboxes for res in sampling_results]
        pos_assigned_gt_inds = [
            res.pos_assigned_gt_inds for res in sampling_results
        ]
        pos_gt_bboxes = [res.pos_gt_bboxes for res in sampling_results]
        pos_gt_labels = [res.pos_gt_labels for res in sampling_results]
        reg_classes = 1 if self.reg_class_agnostic else self.num_classes

        rois_ = [rois[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        delta_c_ = [delta_c[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        delta_r_ = [delta_r[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        cls_score_ = [cls_score[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        bbox_pred_ = [bbox_pred[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        TSD_cls_score_ = [TSD_cls_score[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
        TSD_bbox_pred_ = [TSD_bbox_pred[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]

        cls_reg_targets = bbox_target_tsd(
            pos_proposals,
            neg_proposals,
            pos_assigned_gt_inds,
            gt_masks,
            pos_gt_bboxes,
            pos_gt_labels,
            rois_,
            delta_c_,
            delta_r_,
            cls_score_,
            bbox_pred_,
            TSD_cls_score_,
            TSD_bbox_pred_,
            rcnn_train_cfg,
            reg_classes,
            cls_pc_margin=self.cls_pc_margin,
            loc_pc_margin=self.loc_pc_margin,
            target_means=self.target_means,
            target_stds=self.target_stds,
            with_module=self.with_module)
        return cls_reg_targets

    @force_fp32(apply_to=('cls_score', 'bbox_pred', 'TSD_cls_score', 'TSD_bbox_pred', 'pc_cls_loss', 'pc_loc_loss'))
    def loss(self,
             cls_score,
             bbox_pred,
             TSD_cls_score,
             TSD_bbox_pred,
             labels,
             label_weights,
             bbox_targets,
             bbox_weights,
             TSD_labels, TSD_label_weights, TSD_bbox_targets, TSD_bbox_weights,TSD_roi, pc_cls_loss, pc_loc_loss,
             reduce=None):
        losses = dict()
        if cls_score is not None:
            avg_factor = max(torch.sum(label_weights > 0).float().item(), 1.)
            if cls_score.numel() > 0:
                losses['loss_cls'] = self.loss_cls(
                    cls_score,
                    labels,
                    label_weights,
                    avg_factor=avg_factor,
                    reduce=reduce)
                losses['acc'] = accuracy(cls_score, labels)
        if TSD_cls_score is not None:
            avg_factor = max(torch.sum(TSD_label_weights > 0).float().item(), 1.)
            if TSD_cls_score.numel() > 0:
                losses['loss_TSD_cls'] = self.loss_cls(
                    TSD_cls_score,
                    TSD_labels,
                    TSD_label_weights,
                    avg_factor=avg_factor,
                    reduce=reduce)
                losses['TSD_acc'] = accuracy(TSD_cls_score, TSD_labels)

        if bbox_pred is not None:
            pos_inds = labels > 0
            if pos_inds.any():
                if self.reg_class_agnostic:
                    pos_bbox_pred = bbox_pred.view(
                        bbox_pred.size(0), 5)[pos_inds.type(torch.bool)]
                else:
                    pos_bbox_pred = bbox_pred.view(
                        bbox_pred.size(0), -1,
                        5)[pos_inds.type(torch.bool),
                           labels[pos_inds.type(torch.bool)]]
                losses['loss_bbox'] = self.loss_bbox(
                    pos_bbox_pred,
                    bbox_targets[pos_inds.type(torch.bool)],
                    bbox_weights[pos_inds.type(torch.bool)],
                    avg_factor=bbox_targets.size(0))
        if TSD_bbox_pred is not None:
            pos_inds = TSD_labels > 0
            if pos_inds.any():
                if self.reg_class_agnostic:
                    TSD_bbox_pred = TSD_bbox_pred.view(
                        TSD_bbox_pred.size(0), 5)[pos_inds.type(torch.bool)]
                else:
                    TSD_bbox_pred = TSD_bbox_pred.view(
                        TSD_bbox_pred.size(0), -1,
                        5)[pos_inds.type(torch.bool),
                           TSD_labels[pos_inds.type(torch.bool)]]
                losses['loss_TSD_bbox'] = self.loss_bbox(
                    TSD_bbox_pred,
                    TSD_bbox_targets[pos_inds.type(torch.bool)],
                    TSD_bbox_weights[pos_inds.type(torch.bool)],
                    avg_factor=TSD_bbox_targets.size(0))
        if pc_cls_loss is not None:
            losses['loss_pc_cls'] = pc_cls_loss.mean()
        if pc_loc_loss is not None:
            losses['loss_pc_loc'] = pc_loc_loss.mean()
        return losses

@HEADS.register_module class TSDSharedFCBBoxHead(TSDConvFCBBoxHead):

def __init__(self, num_fcs=2, fc_out_channels=1024, *args, **kwargs):
    assert num_fcs >= 1
    super(TSDSharedFCBBoxHead, self).__init__(
        num_shared_convs=0,
        num_shared_fcs=num_fcs,
        num_cls_convs=0,
        num_cls_fcs=0,
        num_reg_convs=0,
        num_reg_fcs=0,
        fc_out_channels=fc_out_channels,
        *args,
        **kwargs)

def bbox_target_single_tsd(pos_bboxes, neg_bboxes, pos_assigned_gt_inds, gt_masks, pos_gt_bboxes, pos_gt_labels, rois, delta_c, delta_r, clsscore, bboxpred, TSD_clsscore, TSD_bboxpred, cfg, reg_classes=1, cls_pc_margin=0.2, loc_pc_margin=0.2, target_means=[.0, .0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0, 1.0], with_module=True): num_pos = pos_bboxes.size(0) #n*4 num_neg = neg_bboxes.size(0) num_samples = num_pos + num_neg labels = pos_bboxes.new_zeros(num_samples, dtype=torch.long) label_weights = pos_bboxes.new_zeros(num_samples)

bbox_targets = pos_bboxes.new_zeros(num_samples, 4)

# bbox_weights = pos_bboxes.new_zeros(num_samples, 4)
bbox_targets = pos_bboxes.new_zeros(num_samples, 5)
bbox_weights = pos_bboxes.new_zeros(num_samples, 5)

TSD_labels = pos_bboxes.new_zeros(num_samples, dtype=torch.long)
TSD_label_weights = pos_bboxes.new_zeros(num_samples)
# TSD_bbox_targets = pos_bboxes.new_zeros(num_samples, 4)
# TSD_bbox_weights = pos_bboxes.new_zeros(num_samples, 4)
TSD_bbox_targets = pos_bboxes.new_zeros(num_samples, 5)
TSD_bbox_weights = pos_bboxes.new_zeros(num_samples, 5)

pos_gt_masks = gt_masks[pos_assigned_gt_inds.cpu().numpy()]
# 4*2 eight coords
pos_gt_polys = mask2poly(pos_gt_masks)
pos_gt_bp_polys = get_best_begin_point(pos_gt_polys)
# 5 (x,y,h,w,theta)
pos_gt_obbs = torch.from_numpy(polygonToRotRectangle_batch(pos_gt_bp_polys, with_module)).to(pos_bboxes.device)

#generte P_r according to delta_r and rois
w = rois[:,3]-rois[:,1]+1
h = rois[:,4]-rois[:,2]+1
scale = 0.1
rois_r = rois.new_zeros(rois.shape[0],rois.shape[1])
rois_r[:,0] = rois[:,0]
rois_r[:,1] = rois[:,1]+delta_r[:,0]*scale*w
rois_r[:,2] = rois[:,2]+delta_r[:,1]*scale*h
rois_r[:,3] = rois[:,3]+delta_r[:,0]*scale*w
rois_r[:,4] = rois[:,4]+delta_r[:,1]*scale*h
TSD_pos_rois = rois_r[:num_pos]
pos_rois = rois[:num_pos]
pc_cls_loss = rois.new_zeros(1)
pc_loc_loss = rois.new_zeros(1)

if pos_bboxes.size(1) == 4:
    pos_ext_bboxes = hbb2obb_v2(pos_bboxes)
else:
    pos_ext_bboxes = pos_bboxes
if num_pos > 0:
    labels[:num_pos] = pos_gt_labels
    TSD_labels[:num_pos] = pos_gt_labels
    pos_weight = 1.0 if cfg.pos_weight <= 0 else cfg.pos_weight
    label_weights[:num_pos] = pos_weight
    TSD_label_weights[:num_pos] = pos_weight
    # pos_bbox_targets = bbox2delta(pos_bboxes, pos_gt_bboxes, target_means,
    #                               target_stds)
    if with_module:
        rpos_bbox_targets = dbbox2delta(pos_ext_bboxes, pos_gt_obbs, target_means,
                                  target_stds)
    else:
        rpos_bbox_targets = dbbox2delta_v3(pos_ext_bboxes, pos_gt_obbs, target_means,
                                          target_stds)
    # TSD_pos_bbox_targets = bbox2delta(TSD_pos_rois[:,1:], pos_gt_bboxes, target_means,
    #                               target_stds)
    TSD_pos_bbox_targets = bbox2delta(TSD_pos_rois[:,1:], pos_gt_bboxes, target_means[:4],
                                  target_stds[:4])
    bbox_targets[:num_pos, :] = rpos_bbox_targets
    bbox_weights[:num_pos, :] = 1
    TSD_bbox_targets[:num_pos, :4] = TSD_pos_bbox_targets
    TSD_bbox_targets[:num_pos, 4] = pos_gt_obbs[:num_pos,4]
    TSD_bbox_weights[:num_pos, :] = 1

    # compute PC for TSD
    # 1. compute the PC for classification
    cls_score_soft = F.softmax(cls_score_,dim=1)
    TSD_cls_score_soft = F.softmax(TSD_cls_score_,dim=1)
    cls_pc_margin = torch.tensor(cls_pc_margin).to(labels.device).to(dtype=cls_score_soft.dtype)
    cls_pc_margin = torch.min(1-cls_score_soft[np.arange(len(TSD_labels)),labels],cls_pc_margin).detach()
    pc_cls_loss = F.relu(-(TSD_cls_score_soft[np.arange(len(TSD_labels)),TSD_labels] - cls_score_soft[np.arange(len(TSD_labels)),labels].detach() - cls_pc_margin))

    # 2. compute the PC for localization
    N = bbox_pred_.shape[0]
    bbox_pred_ = bbox_pred_.view(N,-1,5)
    TSD_bbox_pred_ = TSD_bbox_pred_.view(N,-1,5)
    # sibling_head_bboxes = delta2bbox(pos_bboxes, bbox_pred_[np.arange(num_pos), labels[:num_pos]], means=target_means[:4], stds=target_stds[:4])
    sibling_head_bboxes = delta2bbox(pos_bboxes, bbox_pred_[np.arange(num_pos), labels[:num_pos]][:,:4], means=target_means[:4], stds=target_stds[:4])
    TSD_head_bboxes = delta2bbox(TSD_pos_rois[:,1:], TSD_bbox_pred_[np.arange(num_pos), TSD_labels[:num_pos]][:,:4], means=target_means[:4], stds=target_stds[:4])

    ious, gious = iou_overlaps(sibling_head_bboxes, pos_gt_bboxes)
    TSD_ious, TSD_gious = iou_overlaps(TSD_head_bboxes, pos_gt_bboxes)
    loc_pc_margin = torch.tensor(loc_pc_margin).to(ious.device).to(dtype=ious.dtype)
    loc_pc_margin = torch.min(1-ious.detach(),loc_pc_margin).detach()
    pc_loc_loss = F.relu(-(TSD_ious - ious.detach() - loc_pc_margin))

if num_neg > 0:
    label_weights[-num_neg:] = 1.
    TSD_label_weights[-num_neg:] = 1.

return labels, label_weights, bbox_targets, bbox_weights, TSD_labels, TSD_label_weights, TSD_bbox_targets, TSD_bbox_weights,rois_r, pc_cls_loss, pc_loc_loss
songguanglu commented 4 years ago

Hi, from your code, I suspect that the performance problem is caused by the following points: In TSDSharedFCBBoxHead,there are TSD head and sibling head. At the training stage, the hyper-parameters in TSD and sibling head are consistent such as target_means and target_stds. The PC loss is applied between TSD and sibling head in TSDSharedFCBBoxHead (rather than SharedFCBBoxHeadRbbox ) and the optimization will ensure the TSD can beyond the sibling head. From your code, I note that besides the TSDSharedFCBBoxHead, you add an extra SharedFCBBoxHeadRbbox. The hyper-parameter in it is different from the TSDSharedFCBBoxHead. They share the same input feature map x and this will cause that the performance of SharedFCBBoxHeadRbbox will be affected by TSDSharedFCBBoxHead. I recommend you to replace the sibling head in TSDSharedFCBBoxHead by your SharedFCBBoxHeadRbbox. Keep the hypter-parameters in SharedFCBBoxHeadRbbox and TSD consistent and the PC loss can be applied between TSD and SharedFCBBoxHeadRbbox.

wangjue-wzq commented 4 years ago

Thanks reply. Sorry for not making it clear before. I want to use TSDSharedFCBBoxHead to complete the rotation target detection, and use SharedFCBBoxHeadRbbox to complete the target direction and size refinement on the basis of TSD. So TSDSharedFCBBoxHead and SharedFCBBoxHeadRbbox are independent.

songguanglu commented 4 years ago

Can you show me your modification on TSDSharedFCBBoxHead? I need to know the meanings of some variables such as tsd_roi and roi2droi.

wangjue-wzq commented 4 years ago

Thank you for your guidance, I have found the problem, the loss calculation error.

lijain commented 3 years ago

Thank you for your guidance, I have found the problem, the loss calculation error. Hello, I also did the similar operation as you, but my TSD head predicted a poor result, but the normal head predicted a good result.I found that there is a problem with the loss of TSD, but it is not clear. What is the reason?Want to ask you