Closed wangjue-wzq closed 4 years ago
Hi, you can compare the performance of sibling head in TSD and the sibling head in baseline. If the performance of them is similar, it means the traditional sibling head in your own dataset can't perform well. The code for delta_c can be found in line 246 in tsd_bbox_head.py.
Thank you for your reply! The TSD is similar with baseline, but the sibling head is worse than baseline.
Can you provide the details about your own dataset and training config? The detailed performance of TSD and baseline can help us analyze this result.
In the training process, the TSD module can be basically fitted, and the classification accuracy during training reaches 99%, but the sibling head is only 84%. Because there are many background samples, 84% means a lot of classification errors. Training is to rotate the target. In the TSDSharedFCBBoxHead module, I added an angle prediction [x,y,w,h,theta], using the loss function Smooth L1; in the SharedFCBBoxHeadRbbox module, it also predicts four coordinates and one angle, loss The function is Smooth L1. I guess it may be a problem with my use of the TSDSharedFCBBoxHead module. The SharedFCBBoxHeadRbbox module has been tested many times and has no errors on other models. Thank you very much, here are some necessary codes and configuration files.
config
# model settings
model = dict(
type='TSDRoITransformer',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[4],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='TSDSharedFCBBoxHead', #the output is [x,y,w,h,theta] the theta is the rotation angle of object,loss is smooth L1
featmap_strides=[4, 8, 16, 32],
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=11,
cls_pc_margin=0.2,
loc_pc_margin=0.2,
target_means=[0., 0., 0., 0.,0.],
target_stds=[0.1, 0.1, 0.2, 0.2,0.1],
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
rbbox_roi_extractor=dict(
type='RboxSingleRoIExtractor',
roi_layer=dict(type='RoIAlignRotated', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
rbbox_head = dict(
type='SharedFCBBoxHeadRbbox', #the output is [x,y,w,h,theta] the theta is the rotation angle of object,loss is smooth L1
num_fcs=2,
num_cls_fcs=0, #add fc to classification
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=11,
target_means=[0., 0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1, 0.05],
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
)
# model training and testing settings
train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=30,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=[dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssignerRbbox',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomRbboxSampler',
num=512, # 512
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
])
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr = 0.05, nms_top=False, nms = dict(type='py_cpu_nms_poly_fast', iou_thr=0.1), max_per_img = 2000)
# score_thr=0.00, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
)
# dataset settings
dataset_type = 'PlaneDataset'
data_root = 'data/Plane/'
img_norm_cfg = dict(
mean=[112.209, 114.267, 97.302], std=[58.826, 50.785, 49.986], to_rgb=True)
# mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
imgs_per_gpu=1,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'plane_aug_train.json',
img_prefix=data_root + 'train_aug/images',
img_scale=(1024, 1024),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0.5,
with_mask=True,
with_crowd=True,
with_label=True),
val=dict(
type=dataset_type,
ann_file=data_root + 'plane_aug_train.json',
img_prefix=data_root + 'train_aug/images',
img_scale=(1024, 1024),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_crowd=False,
with_label=True),
test=dict(
type=dataset_type,
# ann_file=data_root + 'plane_aug_train.json',
# img_prefix=data_root + 'train_aug/images',
ann_file=data_root + 'plane_train.json',
img_prefix=data_root + 'train/images',
img_scale=(1024, 1024),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_label=False,
test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[40, 50])
checkpoint_config = dict(interval=2)
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# runtime settings
total_epochs = 50
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/TSD_faster_rcnn_r50_fpn_1x_plane_small'
load_from = None
resume_from = None
workflow = [('train', 1)]
detector
@DETECTORS.register_module
class TSDRoITransformer(BaseDetectorNew, RPNTestMixin):
def __init__(self,
backbone,
neck=None,
shared_head=None,
shared_head_rbbox=None,
rpn_head=None,
bbox_roi_extractor=None,
bbox_head=None,
rbbox_roi_extractor=None,
rbbox_head=None,
mask_roi_extractor=None,
mask_head=None,
train_cfg=None,
test_cfg=None,
pretrained=None):
assert bbox_roi_extractor is not None
assert bbox_head is not None
assert rbbox_roi_extractor is not None
assert rbbox_head is not None
super(TSDRoITransformer, self).__init__()
self.backbone = builder.build_backbone(backbone)
if neck is not None:
self.neck = builder.build_neck(neck)
if rpn_head is not None:
self.rpn_head = builder.build_head(rpn_head)
if shared_head is not None:
self.shared_head = builder.build_shared_head(shared_head)
if shared_head_rbbox is not None:
self.shared_head_rbbox = builder.build_shared_head(shared_head_rbbox)
if bbox_head is not None:
self.bbox_roi_extractor = builder.build_roi_extractor(
bbox_roi_extractor)
self.bbox_head = builder.build_head(bbox_head)
self.use_TSD = 'TSD' in bbox_head['type']
# import pdb
# pdb.set_trace()
if rbbox_head is not None:
self.rbbox_roi_extractor = builder.build_roi_extractor(
rbbox_roi_extractor)
self.rbbox_head = builder.build_head(rbbox_head)
if mask_head is not None:
if mask_roi_extractor is not None:
self.mask_roi_extractor = builder.build_roi_extractor(
mask_roi_extractor)
self.share_roi_extractor = False
else:
self.share_roi_extractor = True
self.mask_roi_extractor = self.rbbox_roi_extractor
self.mask_head = builder.build_head(mask_head)
self.train_cfg = train_cfg
self.test_cfg = test_cfg
self.init_weights(pretrained=pretrained)
@property
def with_rpn(self):
return hasattr(self, 'rpn_head') and self.rpn_head is not None
def init_weights(self, pretrained=None):
super(TSDRoITransformer, self).init_weights(pretrained)
self.backbone.init_weights(pretrained=pretrained)
if self.with_neck:
if isinstance(self.neck, nn.Sequential):
for m in self.neck:
m.init_weights()
else:
self.neck.init_weights()
if self.with_rpn:
self.rpn_head.init_weights()
if self.with_shared_head:
self.shared_head.init_weights(pretrained=pretrained)
if self.with_shared_head_rbbox:
self.shared_head_rbbox.init_weights(pretrained=pretrained)
if self.with_bbox:
self.bbox_roi_extractor.init_weights()
self.bbox_head.init_weights()
if self.with_rbbox:
self.rbbox_roi_extractor.init_weights()
self.rbbox_head.init_weights()
if self.with_mask:
self.mask_head.init_weights()
if not self.share_roi_extractor:
self.mask_roi_extractor.init_weights()
def extract_feat(self, img):
x = self.backbone(img)
if self.with_neck:
x = self.neck(x)
return x
def forward_train(self,
img,
img_meta,
gt_bboxes, #[n,4] [x,y,h,w]
gt_labels,
gt_bboxes_ignore=None,
gt_masks=None,
proposals=None):
x = self.extract_feat(img) #resnet 5 layers
losses = dict()
# trans gt_masks[1024, 1024] to gt_obbs
# [cx, cy, w, h, theta]
gt_obbs = gt_mask_bp_obbs_list(gt_masks)
# RPN forward and loss
if self.with_rpn:
rpn_outs = self.rpn_head(x)
rpn_loss_inputs = rpn_outs + (gt_bboxes, img_meta,
self.train_cfg.rpn)
rpn_losses = self.rpn_head.loss(
*rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
losses.update(rpn_losses)
proposal_cfg = self.train_cfg.get('rpn_proposal',
self.test_cfg.rpn)
proposal_inputs = rpn_outs + (img_meta, proposal_cfg)
proposal_list = self.rpn_head.get_bboxes(*proposal_inputs)
else:
proposal_list = proposals
# assign gts and sample proposals (hbb assign)
if self.with_bbox or self.with_mask:
bbox_assigner = build_assigner(self.train_cfg.rcnn[0].assigner)
bbox_sampler = build_sampler(
self.train_cfg.rcnn[0].sampler, context=self)
num_imgs = img.size(0)
if gt_bboxes_ignore is None:
gt_bboxes_ignore = [None for _ in range(num_imgs)]
sampling_results = []
for i in range(num_imgs):
# RPN positive negative
assign_result = bbox_assigner.assign(proposal_list[i],
gt_bboxes[i],
gt_bboxes_ignore[i],
gt_labels[i])
# positive negative smaple
sampling_result = bbox_sampler.sample(
assign_result,
proposal_list[i],
gt_bboxes[i],
gt_labels[i],
feats=[lvl_feat[i][None] for lvl_feat in x])
sampling_results.append(sampling_result)
# bbox head forward and loss
# horizonal bbox
if self.with_bbox:
rois = bbox2roi([res.bboxes for res in sampling_results])
# TODO: a more flexible way to decide which feature maps to use
bbox_feats = self.bbox_roi_extractor(
x[:self.bbox_roi_extractor.num_inputs], rois)
if self.with_shared_head:
bbox_feats = self.shared_head(bbox_feats)
# cls_score 512*11
# bbox_pred 512*55
cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(
bbox_feats, x[:self.bbox_roi_extractor.num_inputs],rois)
# return: labels, label_weights, bbox_targets, bbox_weights, TSD_labels, TSD_label_weights,
# TSD_bbox_targets, TSD_bbox_weights, pc_cls_loss, pc_loc_loss
rbbox_targets = self.bbox_head.get_target(rois, sampling_results,gt_masks,
gt_bboxes, gt_labels, delta_c, delta_r, cls_score, bbox_pred,
TSD_cls_score, TSD_bbox_pred,
self.train_cfg.rcnn[0], img_meta)
loss_bbox = self.bbox_head.loss(cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred,
*rbbox_targets)
losses.update(loss_bbox)
pos_is_gts = [res.pos_is_gt for res in sampling_results]
# roi_labels = rbbox_targets[0]
roi_labels = rbbox_targets[0]
tsd_roi = rbbox_targets[-3]
with torch.no_grad():
# import pdb
# pdb.set_trace()
rotated_proposal_list = self.bbox_head.refine_rbboxes(
roi2droi(tsd_roi), roi_labels, TSD_bbox_pred, pos_is_gts, img_meta
)
# assign gts and sample proposals (rbb assign)
# orient bbox
if self.with_rbbox:
bbox_assigner = build_assigner(self.train_cfg.rcnn[1].assigner)
bbox_sampler = build_sampler(
self.train_cfg.rcnn[1].sampler, context=self)
num_imgs = img.size(0)
if gt_bboxes_ignore is None:
gt_bboxes_ignore = [None for _ in range(num_imgs)]
sampling_results = []
for i in range(num_imgs):
gt_obbs_best_roi = choose_best_Rroi_batch(gt_obbs[i])
assign_result = bbox_assigner.assign(
rotated_proposal_list[i], gt_obbs_best_roi, gt_bboxes_ignore[i],
gt_labels[i])
sampling_result = bbox_sampler.sample(
assign_result,
rotated_proposal_list[i],
torch.from_numpy(gt_obbs_best_roi).float().to(rotated_proposal_list[i].device),
gt_labels[i],
feats=[lvl_feat[i][None] for lvl_feat in x])
sampling_results.append(sampling_result)
if self.with_rbbox:
# (batch_ind, x_ctr, y_ctr, w, h, angle)
rrois = dbbox2roi([res.bboxes for res in sampling_results])
# feat enlarge
# rrois[:, 3] = rrois[:, 3] * 1.2
# rrois[:, 4] = rrois[:, 4] * 1.4
rrois[:, 3] = rrois[:, 3] * self.rbbox_roi_extractor.w_enlarge
rrois[:, 4] = rrois[:, 4] * self.rbbox_roi_extractor.h_enlarge
rbbox_feats = self.rbbox_roi_extractor(x[:self.rbbox_roi_extractor.num_inputs],
rrois)
if self.with_shared_head_rbbox:
rbbox_feats = self.shared_head_rbbox(rbbox_feats)
cls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
# SharedFCBBoxHeadRbbox
rbbox_targets = self.rbbox_head.get_target_rbbox(sampling_results, gt_obbs,
gt_labels, self.train_cfg.rcnn[1])
loss_rbbox = self.rbbox_head.loss(cls_score, rbbox_pred, *rbbox_targets)
for name, value in loss_rbbox.items():
losses['s{}.{}'.format(1, name)] = (value)
return losses
def simple_test(self, img, img_meta, proposals=None, rescale=False):
x = self.extract_feat(img)
proposal_list = self.simple_test_rpn(
x, img_meta, self.test_cfg.rpn) if proposals is None else proposals
img_shape = img_meta[0]['img_shape']
scale_factor = img_meta[0]['scale_factor']
rois = bbox2roi(proposal_list)
roi_feats = self.bbox_roi_extractor(
x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
if self.with_shared_head:
roi_feats = self.shared_head(roi_feats)
cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(
roi_feats, x[:self.bbox_roi_extractor.num_inputs],rois)
w = rois[:, 3] - rois[:, 1] + 1
h = rois[:, 4] - rois[:, 2] + 1
scale = 0.1
rois_r = rois.new_zeros(rois.shape[0], rois.shape[1])
rois_r[:, 0] = rois[:, 0]
delta_r = delta_r.to(dtype=rois_r.dtype)
rois_r[:, 1] = rois[:, 1] + delta_r[:, 0] * scale * w
rois_r[:, 2] = rois[:, 2] + delta_r[:, 1] * scale * h
rois_r[:, 3] = rois[:, 3] + delta_r[:, 0] * scale * w
rois_r[:, 4] = rois[:, 4] + delta_r[:, 1] * scale * h
rcnn_test_cfg = self.test_cfg.rcnn
bbox_label = TSD_cls_score.argmax(dim=1)
rrois = self.bbox_head.regress_by_class_rbbox(roi2droi(rois_r), bbox_label, TSD_bbox_pred,
img_meta[0])
rrois_enlarge = copy.deepcopy(rrois)
rrois_enlarge[:, 3] = rrois_enlarge[:, 3] * self.rbbox_roi_extractor.w_enlarge
rrois_enlarge[:, 4] = rrois_enlarge[:, 4] * self.rbbox_roi_extractor.h_enlarge
rbbox_feats = self.rbbox_roi_extractor(
x[:len(self.rbbox_roi_extractor.featmap_strides)], rrois_enlarge)
if self.with_shared_head_rbbox:
rbbox_feats = self.shared_head_rbbox(rbbox_feats)
rcls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
det_rbboxes, det_labels = self.rbbox_head.get_det_rbboxes(
rrois,
rcls_score,
rbbox_pred,
img_shape,
scale_factor,
rescale=rescale,
cfg=rcnn_test_cfg)
rbbox_results = dbbox2result(det_rbboxes, det_labels,
self.rbbox_head.num_classes)
return rbbox_results
def tsd_simple_test_bboxes(self,
x,
img_metas,
proposals,
rcnn_test_cfg,
rescale=False):
"""Test only det bboxes without augmentation."""
rois = bbox2roi(proposals)
roi_feats = self.bbox_roi_extractor(
x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
if self.with_shared_head:
roi_feats = self.shared_head(roi_feats)
cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r = self.bbox_head(roi_feats, x[:self.bbox_roi_extractor.num_inputs], rois)
img_shape = img_metas[0]['img_shape']
scale_factor = img_metas[0]['scale_factor']
w = rois[:,3]-rois[:,1]+1
h = rois[:,4]-rois[:,2]+1
scale = 0.1
rois_r = rois.new_zeros(rois.shape[0],rois.shape[1])
rois_r[:,0] = rois[:,0]
delta_r = delta_r.to(dtype=rois_r.dtype)
rois_r[:,1] = rois[:,1]+delta_r[:,0]*scale*w
rois_r[:,2] = rois[:,2]+delta_r[:,1]*scale*h
rois_r[:,3] = rois[:,3]+delta_r[:,0]*scale*w
rois_r[:,4] = rois[:,4]+delta_r[:,1]*scale*h
det_bboxes, det_labels = self.bbox_head.get_det_bboxes(
rois_r,
TSD_cls_score,
TSD_bbox_pred,
img_shape,
scale_factor,
rescale=rescale,
cfg=rcnn_test_cfg)
return det_bboxes, det_labels
def aug_test(self, imgs, img_metas, proposals=None, rescale=None):
# raise NotImplementedError
# import pdb; pdb.set_trace()
proposal_list = self.aug_test_rpn_rotate(
self.extract_feats(imgs), img_metas, self.test_cfg.rpn)
rcnn_test_cfg = self.test_cfg.rcnn
aug_rbboxes = []
aug_rscores = []
for x, img_meta in zip(self.extract_feats(imgs), img_metas):
# only one image in the batch
img_shape = img_meta[0]['img_shape']
scale_factor = img_meta[0]['scale_factor']
flip = img_meta[0]['flip']
proposals = bbox_mapping(proposal_list[0][:, :4], img_shape,
scale_factor, flip)
angle = img_meta[0]['angle']
# print('img shape: ', img_shape)
if angle != 0:
try:
proposals = bbox_rotate_mapping(proposal_list[0][:, :4], img_shape,
angle)
except:
import pdb; pdb.set_trace()
rois = bbox2roi([proposals])
# recompute feature maps to save GPU memory
roi_feats = self.bbox_roi_extractor(
x[:len(self.bbox_roi_extractor.featmap_strides)], rois)
if self.with_shared_head:
roi_feats = self.shared_head(roi_feats)
cls_score, bbox_pred = self.bbox_head(roi_feats)
bbox_label = cls_score.argmax(dim=1)
rrois = self.bbox_head.regress_by_class_rbbox(roi2droi(rois), bbox_label,
bbox_pred,
img_meta[0])
rrois_enlarge = copy.deepcopy(rrois)
rrois_enlarge[:, 3] = rrois_enlarge[:, 3] * self.rbbox_roi_extractor.w_enlarge
rrois_enlarge[:, 4] = rrois_enlarge[:, 4] * self.rbbox_roi_extractor.h_enlarge
rbbox_feats = self.rbbox_roi_extractor(
x[:len(self.rbbox_roi_extractor.featmap_strides)], rrois_enlarge)
if self.with_shared_head_rbbox:
rbbox_feats = self.shared_head_rbbox(rbbox_feats)
rcls_score, rbbox_pred = self.rbbox_head(rbbox_feats)
rbboxes, rscores = self.rbbox_head.get_det_rbboxes(
rrois,
rcls_score,
rbbox_pred,
img_shape,
scale_factor,
rescale=rescale,
cfg=None)
aug_rbboxes.append(rbboxes)
aug_rscores.append(rscores)
merged_rbboxes, merged_rscores = merge_rotate_aug_bboxes(
aug_rbboxes, aug_rscores, img_metas, rcnn_test_cfg
)
det_rbboxes, det_rlabels = multiclass_nms_rbbox(
merged_rbboxes, merged_rscores, rcnn_test_cfg.score_thr,
rcnn_test_cfg.nms, rcnn_test_cfg.max_per_img)
if rescale:
_det_rbboxes = det_rbboxes
else:
_det_rbboxes = det_rbboxes.clone()
_det_rbboxes[:, :4] *= img_metas[0][0]['scale_factor']
rbbox_results = dbbox2result(_det_rbboxes, det_rlabels,
self.rbbox_head.num_classes)
return rbbox_results
TSD
@HEADS.register_module
class TSDConvFCBBoxHead(BBoxHead,BBoxHeadRbbox):
r"""More general bbox head, with shared conv and fc layers and two optional
separated branches.
def __init__(self,
num_shared_convs=0,
num_shared_fcs=0,
num_cls_convs=0,
num_cls_fcs=0,
num_reg_convs=0,
num_reg_fcs=0,
conv_out_channels=256,
fc_out_channels=1024,
conv_cfg=None,
norm_cfg=None,
cls_pc_margin=0.2,
loc_pc_margin=0.2,
featmap_strides=None,
*args,
**kwargs):
super(TSDConvFCBBoxHead, self).__init__(*args, **kwargs)
assert (num_shared_convs + num_shared_fcs + num_cls_convs +
num_cls_fcs + num_reg_convs + num_reg_fcs > 0)
if num_cls_convs > 0 or num_reg_convs > 0:
assert num_shared_fcs == 0
if not self.with_cls:
assert num_cls_convs == 0 and num_cls_fcs == 0
if not self.with_reg:
assert num_reg_convs == 0 and num_reg_fcs == 0
self.num_shared_convs = num_shared_convs
self.num_shared_fcs = num_shared_fcs
self.num_cls_convs = num_cls_convs
self.num_cls_fcs = num_cls_fcs
self.num_reg_convs = num_reg_convs
self.num_reg_fcs = num_reg_fcs
self.conv_out_channels = conv_out_channels
self.fc_out_channels = fc_out_channels
self.conv_cfg = conv_cfg
self.norm_cfg = norm_cfg
self.cls_pc_margin = cls_pc_margin
self.loc_pc_margin = loc_pc_margin
# add shared fc and specific fcs to generate delta_c and delta_r for disentangling input proposals
self.shared_fc = nn.Sequential(
nn.Linear(self.roi_feat_area * self.in_channels, 256),
nn.ReLU(inplace=True))
self.delta_c = nn.Sequential(
nn.Linear(256, 256),
nn.ReLU(inplace=True),
nn.Linear(256, self.roi_feat_area * 2))
self.delta_r = nn.Sequential(
nn.Linear(256, 256),
nn.ReLU(inplace=True),
nn.Linear(256, 2))
# add AplignPool for Pc and Pr
self.pool_size = int(np.sqrt(self.roi_feat_area))
self.align_pooling_pc = nn.ModuleList([DeltaCPooling(spatial_scale=1.0 / x,
out_size=self.pool_size,
out_channels=self.in_channels,
no_trans=False,
group_size=1,
trans_std=0.1) for x in featmap_strides])
self.align_pooling_pr = nn.ModuleList([DeltaRPooling(spatial_scale=1.0 / x,
out_size=self.pool_size,
out_channels=self.in_channels,
no_trans=False,
group_size=1,
trans_std=0.1) for x in featmap_strides])
# add shared convs and fcs
self.shared_convs, self.shared_fcs, last_layer_dim = \
self._add_conv_fc_branch(
self.num_shared_convs, self.num_shared_fcs, self.in_channels,
True)
self.shared_out_channels = last_layer_dim
# add TSD convs and fcs
self.TSD_pc_convs, self.TSD_pc_fcs, TSD_last_layer_dim = \
self._add_conv_fc_branch(
self.num_shared_convs, self.num_shared_fcs, self.in_channels, True)
self.TSD_pr_convs, self.TSD_pr_fcs, TSD_last_layer_dim = \
self._add_conv_fc_branch(
self.num_shared_convs, self.num_shared_fcs, self.in_channels, True)
self.TSD_out_channels = TSD_last_layer_dim
# add cls specific branch
self.cls_convs, self.cls_fcs, self.cls_last_dim = \
self._add_conv_fc_branch(
self.num_cls_convs, self.num_cls_fcs, self.shared_out_channels)
# add TSD cls specific branch
self.TSD_cls_convs, self.TSD_cls_fcs, self.TSD_cls_last_dim = \
self._add_conv_fc_branch(
self.num_cls_convs, self.num_cls_fcs, self.TSD_out_channels)
# add reg specific branch
self.reg_convs, self.reg_fcs, self.reg_last_dim = \
self._add_conv_fc_branch(
self.num_reg_convs, self.num_reg_fcs, self.shared_out_channels)
# add TSD reg specific branch
self.TSD_reg_convs, self.TSD_reg_fcs, self.TSD_reg_last_dim = \
self._add_conv_fc_branch(
self.num_reg_convs, self.num_reg_fcs, self.TSD_out_channels)
if self.num_shared_fcs == 0 and not self.with_avg_pool:
if self.num_cls_fcs == 0:
self.cls_last_dim *= self.roi_feat_area
self.TSD_cls_last_dim *= self.roi_feat_area
if self.num_reg_fcs == 0:
self.reg_last_dim *= self.roi_feat_area
self.TSD_reg_last_dim *= self.roi_feat_area
self.relu = nn.ReLU(inplace=True)
# reconstruct fc_cls and fc_reg since input channels are changed
if self.with_cls:
self.fc_cls = nn.Linear(self.cls_last_dim, self.num_classes)
self.TSD_fc_cls = nn.Linear(self.TSD_cls_last_dim, self.num_classes)
if self.with_reg:
out_dim_reg = (5 if self.reg_class_agnostic else 5 *
self.num_classes)
# out_dim_reg_tsd = (4 if self.reg_class_agnostic else 4 *
# self.num_classes)
self.fc_reg = nn.Linear(self.reg_last_dim, out_dim_reg)
# self.TSD_fc_reg = nn.Linear(self.TSD_reg_last_dim, out_dim_reg)
self.TSD_fc_reg = nn.Linear(self.TSD_reg_last_dim, out_dim_reg)
def _add_conv_fc_branch(self,
num_branch_convs,
num_branch_fcs,
in_channels,
is_shared=False):
"""Add shared or separable branch
convs -> avg pool (optional) -> fcs
"""
last_layer_dim = in_channels
# add branch specific conv layers
branch_convs = nn.ModuleList()
if num_branch_convs > 0:
for i in range(num_branch_convs):
conv_in_channels = (
last_layer_dim if i == 0 else self.conv_out_channels)
branch_convs.append(
ConvModule(
conv_in_channels,
self.conv_out_channels,
3,
padding=1,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg))
last_layer_dim = self.conv_out_channels
# add branch specific fc layers
branch_fcs = nn.ModuleList()
if num_branch_fcs > 0:
# for shared branch, only consider self.with_avg_pool
# for separated branches, also consider self.num_shared_fcs
if (is_shared
or self.num_shared_fcs == 0) and not self.with_avg_pool:
last_layer_dim *= self.roi_feat_area
for i in range(num_branch_fcs):
fc_in_channels = (
last_layer_dim if i == 0 else self.fc_out_channels)
branch_fcs.append(
nn.Linear(fc_in_channels, self.fc_out_channels))
last_layer_dim = self.fc_out_channels
return branch_convs, branch_fcs, last_layer_dim
def init_weights(self):
super(TSDConvFCBBoxHead, self).init_weights()
# conv layers are already initialized by ConvModule
for module_list in [self.shared_fcs, self.cls_fcs, self.reg_fcs, self.TSD_pc_fcs, self.TSD_pr_fcs,
self.TSD_cls_fcs, self.TSD_reg_fcs]:
for m in module_list.modules():
if isinstance(m, nn.Linear):
# nn.init.xavier_uniform_(m.weight)
nn.init.kaiming_normal_(m.weight.data, a=1)
nn.init.constant_(m.bias, 0)
for module_list in [self.shared_fc, self.delta_c, self.delta_r]:
for m in module_list.modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
nn.init.kaiming_normal_(m.weight.data, a=1)
if m.bias is not None:
m.bias.data.zero_()
def map_roi_levels(self, rois, num_levels):
"""Map rois to corresponding feature levels by scales.
- scale < finest_scale * 2: level 0
- finest_scale * 2 <= scale < finest_scale * 4: level 1
- finest_scale * 4 <= scale < finest_scale * 8: level 2
- scale >= finest_scale * 8: level 3
Args:
rois (Tensor): Input RoIs, shape (k, 5).
num_levels (int): Total level number.
Returns:
Tensor: Level index (0-based) of each RoI, shape (k, )
"""
finest_scale = 56
scale = torch.sqrt(
(rois[:, 3] - rois[:, 1] + 1) * (rois[:, 4] - rois[:, 2] + 1))
target_lvls = torch.floor(torch.log2(scale / finest_scale + 1e-6))
target_lvls = target_lvls.clamp(min=0, max=num_levels - 1).long()
return target_lvls
@force_fp32(apply_to=('feats'))
def forward(self, x, feats, rois):
# generate TSD pc pr and corresponding features
c = x.numel() // x.shape[0]
x1 = x.view(-1, c) # n*12544
x2 = self.shared_fc(x1)
delta_c = self.delta_c(x2)
delta_r = self.delta_r(x2)
num_levels = len(feats) #number_levels = 4
target_lvls = self.map_roi_levels(rois, num_levels)
TSD_cls_feats = x.new_zeros(
rois.size(0), self.in_channels, self.pool_size, self.pool_size)
TSD_loc_feats = x.new_zeros(
rois.size(0), self.in_channels, self.pool_size, self.pool_size)
for i in range(num_levels): #number_levels = 4
inds = target_lvls == i
if inds.any():
delta_c_ = delta_c[inds, :]
delta_r_ = delta_r[inds, :]
rois_ = rois[inds, :]
tsd_feats_cls = self.align_pooling_pc[i](feats[i], rois_, delta_c_.to(dtype=rois_.dtype))
tsd_feats_loc = self.align_pooling_pr[i](feats[i], rois_, delta_r_.to(dtype=rois_.dtype))
TSD_cls_feats[inds] = tsd_feats_cls.to(dtype=x.dtype)
TSD_loc_feats[inds] = tsd_feats_loc.to(dtype=x.dtype)
# shared part for TSD
if self.num_shared_convs > 0:
for conv in self.TSD_pc_convs:
TSD_cls_feats = conv(TSD_cls_feats)
for conv in self.TSD_pr_convs:
TSD_loc_feats = conv(TSD_loc_feats)
if self.num_shared_fcs > 0:
if self.with_avg_pool:
TSD_cls_feats = self.avg_pool(TSD_cls_feats)
TSD_loc_feats = self.avg_pool(TSD_loc_feats)
TSD_cls_feats = TSD_cls_feats.flatten(1)
TSD_loc_feats = TSD_loc_feats.flatten(1)
for fc in self.TSD_pc_fcs:
TSD_cls_feats = self.relu(fc(TSD_cls_feats))
for fc in self.TSD_pr_fcs:
TSD_loc_feats = self.relu(fc(TSD_loc_feats))
# separate branches
TSD_x_cls = TSD_cls_feats
TSD_x_reg = TSD_loc_feats
for conv in self.TSD_cls_convs:
TSD_x_cls = conv(TSD_x_cls)
if TSD_x_cls.dim() > 2:
if self.with_avg_pool:
TSD_x_cls = self.avg_pool(TSD_x_cls)
TSD_x_cls = TSD_x_cls.flatten(1)
for fc in self.TSD_cls_fcs:
TSD_x_cls = self.relu(fc(TSD_x_cls))
for conv in self.TSD_reg_convs:
TSD_x_reg = conv(TSD_x_reg)
if TSD_x_reg.dim() > 2:
if self.with_avg_pool:
TSD_x_reg = self.avg_pool(TSD_x_reg)
TSD_x_reg = TSD_x_reg.flatten(1)
for fc in self.TSD_reg_fcs:
TSD_x_reg = self.relu(fc(TSD_x_reg))
TSD_cls_score = self.TSD_fc_cls(TSD_x_cls) if self.with_cls else None
TSD_bbox_pred = self.TSD_fc_reg(TSD_x_reg) if self.with_reg else None
# shared part for sibling head, only used in training phase.
if self.training:
if self.num_shared_convs > 0:
for conv in self.shared_convs:
x = conv(x)
if self.num_shared_fcs > 0:
if self.with_avg_pool:
x = self.avg_pool(x)
x = x.flatten(1)
for fc in self.shared_fcs:
x = self.relu(fc(x))
# separate branches
x_cls = x
x_reg = x
for conv in self.cls_convs:
x_cls = conv(x_cls)
if x_cls.dim() > 2:
if self.with_avg_pool:
x_cls = self.avg_pool(x_cls)
x_cls = x_cls.flatten(1)
for fc in self.cls_fcs:
x_cls = self.relu(fc(x_cls))
for conv in self.reg_convs:
x_reg = conv(x_reg)
if x_reg.dim() > 2:
if self.with_avg_pool:
x_reg = self.avg_pool(x_reg)
x_reg = x_reg.flatten(1)
for fc in self.reg_fcs:
x_reg = self.relu(fc(x_reg))
cls_score = self.fc_cls(x_cls) if self.with_cls else None
bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
return cls_score, bbox_pred, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r
else:
return None, None, TSD_cls_score, TSD_bbox_pred, delta_c, delta_r
@force_fp32(apply_to=('delta_c', 'delta_r', 'TSD_cls_score', 'TSD_bbox_pred', 'cls_score', 'bbox_pred'))
def get_target(self, rois, sampling_results, gt_masks, gt_bboxes, gt_labels, delta_c, delta_r, cls_score, bbox_pred,
TSD_cls_score, TSD_bbox_pred,rcnn_train_cfg, img_metas):
pos_proposals = [res.pos_bboxes for res in sampling_results]
neg_proposals = [res.neg_bboxes for res in sampling_results]
pos_assigned_gt_inds = [
res.pos_assigned_gt_inds for res in sampling_results
]
pos_gt_bboxes = [res.pos_gt_bboxes for res in sampling_results]
pos_gt_labels = [res.pos_gt_labels for res in sampling_results]
reg_classes = 1 if self.reg_class_agnostic else self.num_classes
rois_ = [rois[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
delta_c_ = [delta_c[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
delta_r_ = [delta_r[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
cls_score_ = [cls_score[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
bbox_pred_ = [bbox_pred[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
TSD_cls_score_ = [TSD_cls_score[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
TSD_bbox_pred_ = [TSD_bbox_pred[(rois[:, 0] == i).type(torch.bool)] for i in range(len(sampling_results))]
cls_reg_targets = bbox_target_tsd(
pos_proposals,
neg_proposals,
pos_assigned_gt_inds,
gt_masks,
pos_gt_bboxes,
pos_gt_labels,
rois_,
delta_c_,
delta_r_,
cls_score_,
bbox_pred_,
TSD_cls_score_,
TSD_bbox_pred_,
rcnn_train_cfg,
reg_classes,
cls_pc_margin=self.cls_pc_margin,
loc_pc_margin=self.loc_pc_margin,
target_means=self.target_means,
target_stds=self.target_stds,
with_module=self.with_module)
return cls_reg_targets
@force_fp32(apply_to=('cls_score', 'bbox_pred', 'TSD_cls_score', 'TSD_bbox_pred', 'pc_cls_loss', 'pc_loc_loss'))
def loss(self,
cls_score,
bbox_pred,
TSD_cls_score,
TSD_bbox_pred,
labels,
label_weights,
bbox_targets,
bbox_weights,
TSD_labels, TSD_label_weights, TSD_bbox_targets, TSD_bbox_weights,TSD_roi, pc_cls_loss, pc_loc_loss,
reduce=None):
losses = dict()
if cls_score is not None:
avg_factor = max(torch.sum(label_weights > 0).float().item(), 1.)
if cls_score.numel() > 0:
losses['loss_cls'] = self.loss_cls(
cls_score,
labels,
label_weights,
avg_factor=avg_factor,
reduce=reduce)
losses['acc'] = accuracy(cls_score, labels)
if TSD_cls_score is not None:
avg_factor = max(torch.sum(TSD_label_weights > 0).float().item(), 1.)
if TSD_cls_score.numel() > 0:
losses['loss_TSD_cls'] = self.loss_cls(
TSD_cls_score,
TSD_labels,
TSD_label_weights,
avg_factor=avg_factor,
reduce=reduce)
losses['TSD_acc'] = accuracy(TSD_cls_score, TSD_labels)
if bbox_pred is not None:
pos_inds = labels > 0
if pos_inds.any():
if self.reg_class_agnostic:
pos_bbox_pred = bbox_pred.view(
bbox_pred.size(0), 5)[pos_inds.type(torch.bool)]
else:
pos_bbox_pred = bbox_pred.view(
bbox_pred.size(0), -1,
5)[pos_inds.type(torch.bool),
labels[pos_inds.type(torch.bool)]]
losses['loss_bbox'] = self.loss_bbox(
pos_bbox_pred,
bbox_targets[pos_inds.type(torch.bool)],
bbox_weights[pos_inds.type(torch.bool)],
avg_factor=bbox_targets.size(0))
if TSD_bbox_pred is not None:
pos_inds = TSD_labels > 0
if pos_inds.any():
if self.reg_class_agnostic:
TSD_bbox_pred = TSD_bbox_pred.view(
TSD_bbox_pred.size(0), 5)[pos_inds.type(torch.bool)]
else:
TSD_bbox_pred = TSD_bbox_pred.view(
TSD_bbox_pred.size(0), -1,
5)[pos_inds.type(torch.bool),
TSD_labels[pos_inds.type(torch.bool)]]
losses['loss_TSD_bbox'] = self.loss_bbox(
TSD_bbox_pred,
TSD_bbox_targets[pos_inds.type(torch.bool)],
TSD_bbox_weights[pos_inds.type(torch.bool)],
avg_factor=TSD_bbox_targets.size(0))
if pc_cls_loss is not None:
losses['loss_pc_cls'] = pc_cls_loss.mean()
if pc_loc_loss is not None:
losses['loss_pc_loc'] = pc_loc_loss.mean()
return losses
@HEADS.register_module class TSDSharedFCBBoxHead(TSDConvFCBBoxHead):
def __init__(self, num_fcs=2, fc_out_channels=1024, *args, **kwargs):
assert num_fcs >= 1
super(TSDSharedFCBBoxHead, self).__init__(
num_shared_convs=0,
num_shared_fcs=num_fcs,
num_cls_convs=0,
num_cls_fcs=0,
num_reg_convs=0,
num_reg_fcs=0,
fc_out_channels=fc_out_channels,
*args,
**kwargs)
def bbox_target_single_tsd(pos_bboxes, neg_bboxes, pos_assigned_gt_inds, gt_masks, pos_gt_bboxes, pos_gt_labels, rois, delta_c, delta_r, clsscore, bboxpred, TSD_clsscore, TSD_bboxpred, cfg, reg_classes=1, cls_pc_margin=0.2, loc_pc_margin=0.2, target_means=[.0, .0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0, 1.0], with_module=True): num_pos = pos_bboxes.size(0) #n*4 num_neg = neg_bboxes.size(0) num_samples = num_pos + num_neg labels = pos_bboxes.new_zeros(num_samples, dtype=torch.long) label_weights = pos_bboxes.new_zeros(num_samples)
# bbox_weights = pos_bboxes.new_zeros(num_samples, 4)
bbox_targets = pos_bboxes.new_zeros(num_samples, 5)
bbox_weights = pos_bboxes.new_zeros(num_samples, 5)
TSD_labels = pos_bboxes.new_zeros(num_samples, dtype=torch.long)
TSD_label_weights = pos_bboxes.new_zeros(num_samples)
# TSD_bbox_targets = pos_bboxes.new_zeros(num_samples, 4)
# TSD_bbox_weights = pos_bboxes.new_zeros(num_samples, 4)
TSD_bbox_targets = pos_bboxes.new_zeros(num_samples, 5)
TSD_bbox_weights = pos_bboxes.new_zeros(num_samples, 5)
pos_gt_masks = gt_masks[pos_assigned_gt_inds.cpu().numpy()]
# 4*2 eight coords
pos_gt_polys = mask2poly(pos_gt_masks)
pos_gt_bp_polys = get_best_begin_point(pos_gt_polys)
# 5 (x,y,h,w,theta)
pos_gt_obbs = torch.from_numpy(polygonToRotRectangle_batch(pos_gt_bp_polys, with_module)).to(pos_bboxes.device)
#generte P_r according to delta_r and rois
w = rois[:,3]-rois[:,1]+1
h = rois[:,4]-rois[:,2]+1
scale = 0.1
rois_r = rois.new_zeros(rois.shape[0],rois.shape[1])
rois_r[:,0] = rois[:,0]
rois_r[:,1] = rois[:,1]+delta_r[:,0]*scale*w
rois_r[:,2] = rois[:,2]+delta_r[:,1]*scale*h
rois_r[:,3] = rois[:,3]+delta_r[:,0]*scale*w
rois_r[:,4] = rois[:,4]+delta_r[:,1]*scale*h
TSD_pos_rois = rois_r[:num_pos]
pos_rois = rois[:num_pos]
pc_cls_loss = rois.new_zeros(1)
pc_loc_loss = rois.new_zeros(1)
if pos_bboxes.size(1) == 4:
pos_ext_bboxes = hbb2obb_v2(pos_bboxes)
else:
pos_ext_bboxes = pos_bboxes
if num_pos > 0:
labels[:num_pos] = pos_gt_labels
TSD_labels[:num_pos] = pos_gt_labels
pos_weight = 1.0 if cfg.pos_weight <= 0 else cfg.pos_weight
label_weights[:num_pos] = pos_weight
TSD_label_weights[:num_pos] = pos_weight
# pos_bbox_targets = bbox2delta(pos_bboxes, pos_gt_bboxes, target_means,
# target_stds)
if with_module:
rpos_bbox_targets = dbbox2delta(pos_ext_bboxes, pos_gt_obbs, target_means,
target_stds)
else:
rpos_bbox_targets = dbbox2delta_v3(pos_ext_bboxes, pos_gt_obbs, target_means,
target_stds)
# TSD_pos_bbox_targets = bbox2delta(TSD_pos_rois[:,1:], pos_gt_bboxes, target_means,
# target_stds)
TSD_pos_bbox_targets = bbox2delta(TSD_pos_rois[:,1:], pos_gt_bboxes, target_means[:4],
target_stds[:4])
bbox_targets[:num_pos, :] = rpos_bbox_targets
bbox_weights[:num_pos, :] = 1
TSD_bbox_targets[:num_pos, :4] = TSD_pos_bbox_targets
TSD_bbox_targets[:num_pos, 4] = pos_gt_obbs[:num_pos,4]
TSD_bbox_weights[:num_pos, :] = 1
# compute PC for TSD
# 1. compute the PC for classification
cls_score_soft = F.softmax(cls_score_,dim=1)
TSD_cls_score_soft = F.softmax(TSD_cls_score_,dim=1)
cls_pc_margin = torch.tensor(cls_pc_margin).to(labels.device).to(dtype=cls_score_soft.dtype)
cls_pc_margin = torch.min(1-cls_score_soft[np.arange(len(TSD_labels)),labels],cls_pc_margin).detach()
pc_cls_loss = F.relu(-(TSD_cls_score_soft[np.arange(len(TSD_labels)),TSD_labels] - cls_score_soft[np.arange(len(TSD_labels)),labels].detach() - cls_pc_margin))
# 2. compute the PC for localization
N = bbox_pred_.shape[0]
bbox_pred_ = bbox_pred_.view(N,-1,5)
TSD_bbox_pred_ = TSD_bbox_pred_.view(N,-1,5)
# sibling_head_bboxes = delta2bbox(pos_bboxes, bbox_pred_[np.arange(num_pos), labels[:num_pos]], means=target_means[:4], stds=target_stds[:4])
sibling_head_bboxes = delta2bbox(pos_bboxes, bbox_pred_[np.arange(num_pos), labels[:num_pos]][:,:4], means=target_means[:4], stds=target_stds[:4])
TSD_head_bboxes = delta2bbox(TSD_pos_rois[:,1:], TSD_bbox_pred_[np.arange(num_pos), TSD_labels[:num_pos]][:,:4], means=target_means[:4], stds=target_stds[:4])
ious, gious = iou_overlaps(sibling_head_bboxes, pos_gt_bboxes)
TSD_ious, TSD_gious = iou_overlaps(TSD_head_bboxes, pos_gt_bboxes)
loc_pc_margin = torch.tensor(loc_pc_margin).to(ious.device).to(dtype=ious.dtype)
loc_pc_margin = torch.min(1-ious.detach(),loc_pc_margin).detach()
pc_loc_loss = F.relu(-(TSD_ious - ious.detach() - loc_pc_margin))
if num_neg > 0:
label_weights[-num_neg:] = 1.
TSD_label_weights[-num_neg:] = 1.
return labels, label_weights, bbox_targets, bbox_weights, TSD_labels, TSD_label_weights, TSD_bbox_targets, TSD_bbox_weights,rois_r, pc_cls_loss, pc_loc_loss
Hi, from your code, I suspect that the performance problem is caused by the following points: In TSDSharedFCBBoxHead,there are TSD head and sibling head. At the training stage, the hyper-parameters in TSD and sibling head are consistent such as target_means and target_stds. The PC loss is applied between TSD and sibling head in TSDSharedFCBBoxHead (rather than SharedFCBBoxHeadRbbox ) and the optimization will ensure the TSD can beyond the sibling head. From your code, I note that besides the TSDSharedFCBBoxHead, you add an extra SharedFCBBoxHeadRbbox. The hyper-parameter in it is different from the TSDSharedFCBBoxHead. They share the same input feature map x and this will cause that the performance of SharedFCBBoxHeadRbbox will be affected by TSDSharedFCBBoxHead. I recommend you to replace the sibling head in TSDSharedFCBBoxHead by your SharedFCBBoxHeadRbbox. Keep the hypter-parameters in SharedFCBBoxHeadRbbox and TSD consistent and the PC loss can be applied between TSD and SharedFCBBoxHeadRbbox.
Thanks reply. Sorry for not making it clear before. I want to use TSDSharedFCBBoxHead to complete the rotation target detection, and use SharedFCBBoxHeadRbbox to complete the target direction and size refinement on the basis of TSD. So TSDSharedFCBBoxHead and SharedFCBBoxHeadRbbox are independent.
Can you show me your modification on TSDSharedFCBBoxHead? I need to know the meanings of some variables such as tsd_roi and roi2droi.
Thank you for your guidance, I have found the problem, the loss calculation error.
Thank you for your guidance, I have found the problem, the loss calculation error. Hello, I also did the similar operation as you, but my TSD head predicted a poor result, but the normal head predicted a good result.I found that there is a problem with the loss of TSD, but it is not clear. What is the reason?Want to ask you
There are two branches in the framework, one is TSD, and the other is sibling head. On my own data set, TSD performance is better, but the sibling are not good. What are the possible reasons for this? In addition, delta_c should act on TSD_cls, but I did not find the corresponding code.