PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.64k stars 2.87k forks source link

请教关于网络架构的问题 #1581

Closed a2824256 closed 3 years ago

a2824256 commented 3 years ago

目前基于mask_rcnn_r50_fpn_1x设计了一个新的mask-rcnn架构。将FPN输出的5个特征图,分别输入到5个rpn_head,rpn_head有三种,区别在于aspect_ratios与variance的数值不一样。前两层FPN的特征图使用一种rpn_head,中间的一层使用一种rpn_head,剩下的两层使用一种rpn_head,然后在使用concat将rpn_head输出的rois合并,再输入到BBoxAssigner,在训练的时候,就报错了,以下为报错信息:

(ppdet) D:\PaddleDetection-release-0.4>python ./tools/train.py -c ./configs/mask_rcnn_r50_fpn_pam_1x.yml
2020-10-20 12:39:35,554-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.
000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
W1020 12:39:35.830004  3812 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 11.0, Runtime API Version: 10.0
W1020 12:39:35.837980  3812 device_context.cc:260] device: 0, cuDNN Version: 7.6.
2020-10-20 12:39:37,672-WARNING: C:\Users\lygg7/.cache/paddle/weights\ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, sav
e_vars ]
E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w
_0
  format(" ".join(unused_para_list)))
loading annotations into memory...
Done (t=15.26s)
creating index...
index created!
2020-10-20 12:40:16,739-INFO: places would be ommited when DataLoader is not iterable
W1020 12:40:16.755209  3812 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
1603168817      The content of rois_m layer:    The place is:CUDAPlace(0)
Tensor[collect.tmp_2]
        shape: [96,4,]
        dtype: float
        LoD: [[ 0,96, ]]
        data: 0,76.3508,39.4064,107.999,132.003,9.51461,146.775,41.442,101.607,0,146.566,30.2602,77.674,21.2391,107.898,52.7692,85.9763,77.3872,116.511,110.455,
1603168817      The content of rois_l1 layer:   The place is:CUDAPlace(0)
Tensor[collect.tmp_3]
        shape: [69,4,]
        dtype: float
        LoD: [[ 0,69, ]]
        data: 49.7918,0,78.5349,26.0956,0,0,42.9374,24.1623,47.5825,0,92.2774,25.8964,56.18,0,84.2061,29.7415,16.9439,0,43.5082,20.7294,
1603168817      The content of rois_s1 layer:   The place is:CUDAPlace(0)
Tensor[collect.tmp_0]
        shape: [92,4,]
        dtype: float
        LoD: [[ 0,92, ]]
        data: 529.385,114.831,559.272,144.092,530.762,127.152,561.073,156.503,535.415,121.926,564.759,151.248,524.143,111.675,554.556,141.239,528.66,120.925,558.262,150.27,
1603168817      The content of rois_l2 layer:   The place is:CUDAPlace(0)
Tensor[collect.tmp_4]
        shape: [55,4,]
        dtype: float
        LoD: [[ 0,55, ]]
        data: 0,0,29.4637,18.683,15.8423,0,45.2825,21.7693,13.107,0,60.6997,26.6516,0,0,42.8977,23.0975,2.74374,0,50.5755,28.2675,
1603168817      The content of rois_s2 layer:   The place is:CUDAPlace(0)
Tensor[collect.tmp_1]
        shape: [124,4,]
        dtype: float
        LoD: [[ 0,124, ]]
        data: 65.4591,6.66639,95.5063,38.1761,71.9609,5.79194,101.379,38.0712,13.4269,6.38562,42.308,37.4716,0,165.868,24.792,196.435,150.318,82.4284,165.886,114.85,
1603168817      The content of rois layer:      The place is:CUDAPlace(0)
Tensor[concat_0.tmp_0]
        shape: [436,4,]
        dtype: float
        LoD: [[ 0,92,216,312,381,436, ]]
        data: 529.385,114.831,559.272,144.092,530.762,127.152,561.073,156.503,535.415,121.926,564.759,151.248,524.143,111.675,554.556,141.239,528.66,120.925,558.262,150.27,
E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "./tools/train.py", line 373, in <module>
    main()
  File "./tools/train.py", line 246, in main
    outs = exe.run(compiled_train_prog, fetch_list=train_values)
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\six.py", line 703, in reraise
    raise value
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\executor.py", line 1066, in run
    return_merged=return_merged)
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\executor.py", line 1167, in _run_impl
    return_merged=return_merged)
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\executor.py", line 879, in _run_parallel
    tensors = exe.run(fetch_var_names, return_merged)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op
    attrs=kwargs.get("attrs", None))
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "E:\ProgramData\Miniconda3\envs\ppdet\lib\site-packages\paddle\fluid\layers\detection.py", line 2610, in generate_proposal_labels
    'is_cascade_rcnn': is_cascade_rcnn
  File "D:\PaddleDetection-release-0.4\ppdet\core\workspace.py", line 174, in partial_apply
    return op(*args, **kwargs_)
  File "D:\PaddleDetection-release-0.4\ppdet\modeling\architectures\mask_rcnn_pam.py", line 162, in build
    im_info=feed_vars['im_info'])
  File "D:\PaddleDetection-release-0.4\ppdet\modeling\architectures\mask_rcnn_pam.py", line 398, in train
    return self.build(feed_vars, 'train')
  File "./tools/train.py", line 118, in main
    train_fetches = model.train(feed_vars)
  File "./tools/train.py", line 373, in <module>
    main()

----------------------
Error Message Summary:
----------------------
Error: The start row index must be lesser than the end row index.
  [Hint: Expected begin_idx < end_idx, but received begin_idx:16 >= end_idx:0.] at (D:\1.8.5\paddle\paddle\fluid\framework\tensor.cc:83)
  [operator < generate_proposal_labels > error]

报错信息提及到generate_proposal_labels, generate_proposal_labels 实际上就是BBoxAssigner操作,经过排查5个rpn_head输出的rois使用concat合并后,就报错了,单独使用一个rpn_head输出的rois是正常的,我把5个rpn_head的输出通过fluid.layer.Print打印后,发现每个rpn_head输出的rois的LodTensor的lod属性是[0, x] ,x代表大于0的整数,concat后的rois的lod参数是[0,x, x, x, x, x],是否不应该使用这种合并方式?我的目的是将5个rpn_head输出的rois合并后输入到BBoxAssigner和MaskAssigner。 以下为配置文件和新的mask_rcnn架构文件: yml配置文件, FPNRPNHeadS、FPNRPNHeadM、FPNRPNHeadL在ppdet/modeling中与FPNRPNHead是一样的代码,只是类名不一样,做成三个是方便配置三种不一样的anchor_generator。目前下面配置代码FPNRPNHead的anchor_generator还没做出差异性修改。

architecture: IMPMaskRCNN
use_gpu: true
max_iters: 2000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
metric: COCO
weights: output/mask_rcnn_r50_fpn_1x/model_final
num_classes: 81

IMPMaskRCNN:
  backbone: ResNet
  fpn: FPN
  rpn_head_s1: FPNRPNHeadS
  rpn_head_s2: FPNRPNHeadS
  rpn_head_m: FPNRPNHeadM
  rpn_head_l1: FPNRPNHeadL
  rpn_head_l2: FPNRPNHeadL
  roi_extractor: FPNRoIAlign
  bbox_head: BBoxHead
  bbox_assigner: BBoxAssigner

ResNet:
  depth: 50
  feature_maps: [2, 3, 4, 5]
  freeze_at: 2
  norm_type: bn

FPN:
  max_level: 6
  min_level: 2
  num_chan: 256
  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]

FPNRPNHeadS:
  anchor_generator:
    aspect_ratios: [0.5, 1.0, 2.0]
    variance: [1.0, 1.0, 1.0, 1.0]
  anchor_start_size: 32
  max_level: 1
  min_level: 1
  num_chan: 256
  rpn_target_assign:
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_negative_overlap: 0.3
    rpn_positive_overlap: 0.7
    rpn_straddle_thresh: 0.0
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 400
    post_nms_top_n: 400
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 200
    post_nms_top_n: 200

FPNRPNHeadM:
  anchor_generator:
    aspect_ratios: [0.5, 1.0, 2.0]
    variance: [1.0, 1.0, 1.0, 1.0]
  anchor_start_size: 32
  max_level: 1
  min_level: 1
  num_chan: 256
  rpn_target_assign:
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_negative_overlap: 0.3
    rpn_positive_overlap: 0.7
    rpn_straddle_thresh: 0.0
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 400
    post_nms_top_n: 400
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 200
    post_nms_top_n: 200

FPNRPNHeadL:
  anchor_generator:
    aspect_ratios: [0.5, 1.0, 2.0]
    variance: [1.0, 1.0, 1.0, 1.0]
  anchor_start_size: 32
  max_level: 1
  min_level: 1
  num_chan: 256
  rpn_target_assign:
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_negative_overlap: 0.3
    rpn_positive_overlap: 0.7
    rpn_straddle_thresh: 0.0
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 400
    post_nms_top_n: 400
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 200
    post_nms_top_n: 200

FPNRoIAlign:
  canconical_level: 4
  canonical_size: 224
  max_level: 5
  min_level: 2
  sampling_ratio: 2
  box_resolution: 7
  mask_resolution: 14

MaskHead:
  dilation: 1
  conv_dim: 256
  num_convs: 4
  resolution: 28

BBoxAssigner:
  batch_size_per_im: 512
  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
  bg_thresh_hi: 0.5
  bg_thresh_lo: 0.0
  fg_fraction: 0.25
  fg_thresh: 0.5

MaskAssigner:
  resolution: 28

BBoxHead:
  head: TwoFCHead
  nms:
    keep_top_k: 100
    nms_threshold: 0.5
    score_threshold: 0.05

TwoFCHead:
  mlp_dim: 1024

LearningRate:
  base_lr: 0.01
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [1000, 1500]
  - !LinearWarmup
    start_factor: 0.3333333333333333
    steps: 500

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0001
    type: L2

_READER_: 'mask_fpn_reader.yml'

mask_rcnn架构文件, 主要修改的代码在build函数中:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from collections import OrderedDict
import copy
import collections
import paddle.fluid as fluid

from ppdet.experimental import mixed_precision_global_state
from ppdet.core.workspace import register

from .input_helper import multiscale_def

__all__ = ['IMPMaskRCNN']

@register
class IMPMaskRCNN(object):

    __category__ = 'architecture'
    __inject__ = [
        'backbone', 'rpn_head_s1', 'rpn_head_s2', 'rpn_head_m', 'rpn_head_l1', 'rpn_head_l2', 'bbox_assigner', 'roi_extractor', 'bbox_head',
        'mask_assigner', 'mask_head', 'fpn'
    ]

    def __init__(self,
                 backbone,
                 rpn_head_s1,
                 rpn_head_s2,
                 rpn_head_m,
                 rpn_head_l1,
                 rpn_head_l2,
                 bbox_head='BBoxHead',
                 bbox_assigner='BBoxAssigner',
                 roi_extractor='RoIAlign',
                 mask_assigner='MaskAssigner',
                 mask_head='MaskHead',
                 rpn_only=False,
                 fpn=None):
        super(IMPMaskRCNN, self).__init__()
        self.backbone = backbone
        self.rpn_head_s1 = rpn_head_s1
        self.rpn_head_s2 = rpn_head_s2
        self.rpn_head_m = rpn_head_m
        self.rpn_head_l1 = rpn_head_l1
        self.rpn_head_l2 = rpn_head_l2
        self.bbox_assigner = bbox_assigner
        self.roi_extractor = roi_extractor
        self.bbox_head = bbox_head
        self.mask_assigner = mask_assigner
        self.mask_head = mask_head
        self.rpn_only = rpn_only
        self.fpn = fpn

    def build(self, feed_vars, mode='train'):
        if mode == 'train':
            required_fields = [
                'gt_class', 'gt_bbox', 'gt_mask', 'is_crowd', 'im_info'
            ]
        else:
            required_fields = ['im_shape', 'im_info']
        self._input_check(required_fields, feed_vars)
        im = feed_vars['image']
        im_info = feed_vars['im_info']

        mixed_precision_enabled = mixed_precision_global_state() is not None
        # cast inputs to FP16
        if mixed_precision_enabled:
            im = fluid.layers.cast(im, 'float16')

        # backbone
        body_feats = self.backbone(im)

        # cast features back to FP32
        if mixed_precision_enabled:
            body_feats = OrderedDict((k, fluid.layers.cast(v, 'float32'))
                                     for k, v in body_feats.items())

        # FPN
        spatial_scale = None
        if self.fpn is not None:
            body_feats, spatial_scale = self.fpn.get_output(body_feats)
        P2 = collections.OrderedDict()
        P2['fpn_res2_sum'] = body_feats['fpn_res2_sum']
        P3 = collections.OrderedDict()
        P3['fpn_res3_sum'] = body_feats['fpn_res3_sum']
        P4 = collections.OrderedDict()
        P4['fpn_res4_sum'] = body_feats['fpn_res4_sum']
        P5 = collections.OrderedDict()
        P5['fpn_res5_sum'] = body_feats['fpn_res5_sum']
        P6 = collections.OrderedDict()
        P6['fpn_res5_sum_subsampled_2x'] = body_feats['fpn_res5_sum_subsampled_2x']
        # RPN proposals
        # 添加多路rpn再做特征融合
        rois_s1 = self.rpn_head_s1.get_proposals(P2, im_info, mode=mode)
        rois_s2 = self.rpn_head_s2.get_proposals(P3, im_info, mode=mode)
        # rois = fluid.layers.concat(input=[rois_s1, rois_s2], axis=0)
        rois_m = self.rpn_head_m.get_proposals(P4, im_info, mode=mode)
        rois_l1 = self.rpn_head_l1.get_proposals(P5, im_info, mode=mode)
        rois_l2 = self.rpn_head_l2.get_proposals(P6, im_info, mode=mode)
        rois = fluid.layers.concat(input=[rois_s1, rois_s2, rois_m, rois_l1, rois_l2], axis=0)
        if mode == 'train':
            rpn_loss1 = self.rpn_head_s1.get_loss(im_info, feed_vars['gt_bbox'], feed_vars['is_crowd'])
            rpn_loss2 = self.rpn_head_s2.get_loss(im_info, feed_vars['gt_bbox'], feed_vars['is_crowd'])
            rpn_loss3 = self.rpn_head_m.get_loss(im_info, feed_vars['gt_bbox'], feed_vars['is_crowd'])
            rpn_loss4 = self.rpn_head_l1.get_loss(im_info, feed_vars['gt_bbox'], feed_vars['is_crowd'])
            rpn_loss5 = self.rpn_head_l2.get_loss(im_info, feed_vars['gt_bbox'], feed_vars['is_crowd'])
            loss_rpn_cls = fluid.layers.concat(input=[rpn_loss1['loss_rpn_cls'], rpn_loss2['loss_rpn_cls'], rpn_loss3['loss_rpn_cls'], rpn_loss4['loss_rpn_cls'], rpn_loss5['loss_rpn_cls']], axis=0)
            loss_rpn_bbox = fluid.layers.concat(input=[rpn_loss1['loss_rpn_bbox'], rpn_loss2['loss_rpn_bbox'], rpn_loss3['loss_rpn_bbox'], rpn_loss4['loss_rpn_bbox'], rpn_loss5['loss_rpn_bbox']], axis=0)
            loss_rpn_cls_loss = fluid.layers.mean(loss_rpn_cls)
            loss_rpn_bbox_loss = fluid.layers.mean(loss_rpn_bbox)
            rpn_loss = {'loss_rpn_cls':loss_rpn_cls_loss, 'loss_rpn_cls':loss_rpn_bbox_loss}
            outs = self.bbox_assigner(
                rpn_rois=rois,
                gt_classes=feed_vars['gt_class'],
                is_crowd=feed_vars['is_crowd'],
                gt_boxes=feed_vars['gt_bbox'],
                im_info=feed_vars['im_info'])
            rois = outs[0]
            labels_int32 = outs[1]

            if self.fpn is None:
                last_feat = body_feats[list(body_feats.keys())[-1]]
                roi_feat = self.roi_extractor(last_feat, rois)
            else:
                roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)

            loss = self.bbox_head.get_loss(roi_feat, labels_int32, *outs[2:])
            loss.update(rpn_loss)

            mask_rois, roi_has_mask_int32, mask_int32 = self.mask_assigner(
                rois=rois,
                gt_classes=feed_vars['gt_class'],
                is_crowd=feed_vars['is_crowd'],
                gt_segms=feed_vars['gt_mask'],
                im_info=feed_vars['im_info'],
                labels_int32=labels_int32)
            if self.fpn is None:
                bbox_head_feat = self.bbox_head.get_head_feat()
                feat = fluid.layers.gather(bbox_head_feat, roi_has_mask_int32)
            else:
                feat = self.roi_extractor(
                    body_feats, mask_rois, spatial_scale, is_mask=True)

            mask_loss = self.mask_head.get_loss(feat, mask_int32)
            loss.update(mask_loss)

            total_loss = fluid.layers.sum(list(loss.values()))
            loss.update({'loss': total_loss})
            return loss

        else:
            if self.rpn_only:
                im_scale = fluid.layers.slice(
                    im_info, [1], starts=[2], ends=[3])
                im_scale = fluid.layers.sequence_expand(im_scale, rois)
                rois = rois / im_scale
                return {'proposal': rois}
            mask_name = 'mask_pred'
            mask_pred, bbox_pred = self.single_scale_eval(
                body_feats, mask_name, rois, im_info, feed_vars['im_shape'],
                spatial_scale)
            return {'bbox': bbox_pred, 'mask': mask_pred}

    def build_multi_scale(self, feed_vars, mask_branch=False):
        required_fields = ['image', 'im_info']
        self._input_check(required_fields, feed_vars)

        result = {}
        if not mask_branch:
            assert 'im_shape' in feed_vars, \
                "{} has no im_shape field".format(feed_vars)
            result.update(feed_vars)

        for i in range(len(self.im_info_names) // 2):
            im = feed_vars[self.im_info_names[2 * i]]
            im_info = feed_vars[self.im_info_names[2 * i + 1]]
            body_feats = self.backbone(im)

            # FPN
            if self.fpn is not None:
                body_feats, spatial_scale = self.fpn.get_output(body_feats)
            P2 = collections.OrderedDict()
            P2['fpn_res2_sum'] = body_feats['fpn_res2_sum']
            P3 = collections.OrderedDict()
            P3['fpn_res3_sum'] = body_feats['fpn_res3_sum']
            P4 = collections.OrderedDict()
            P4['fpn_res4_sum'] = body_feats['fpn_res4_sum']
            P5 = collections.OrderedDict()
            P5['fpn_res5_sum'] = body_feats['fpn_res5_sum']
            P6 = collections.OrderedDict()
            P6['fpn_res5_sum_subsampled_2x'] = body_feats['fpn_res5_sum_subsampled_2x']
            # RPN proposals
            # 添加多路rpn再做特征融合
            rois_s1 = self.rpn_head_s1.get_proposals(P2, im_info, mode='test')
            rois_s2 = self.rpn_head_s2.get_proposals(P3, im_info, mode='test')
            rois_m = self.rpn_head_m.get_proposals(P4, im_info, mode='test')
            rois_l1 = self.rpn_head_l1.get_proposals(P5, im_info, mode='test')
            rois_l2 = self.rpn_head_l2.get_proposals(P6, im_info, mode='test')
            rois = fluid.layers.concat(input=[rois_s1, rois_s2, rois_m, rois_l1, rois_l2], axis=0)

            if not mask_branch:
                im_shape = feed_vars['im_shape']
                body_feat_names = list(body_feats.keys())
                if self.fpn is None:
                    body_feat = body_feats[body_feat_names[-1]]
                    roi_feat = self.roi_extractor(body_feat, rois)
                else:
                    roi_feat = self.roi_extractor(body_feats, rois,
                                                  spatial_scale)
                pred = self.bbox_head.get_prediction(
                    roi_feat, rois, im_info, im_shape, return_box_score=True)
                bbox_name = 'bbox_' + str(i)
                score_name = 'score_' + str(i)
                if 'flip' in im.name:
                    bbox_name += '_flip'
                    score_name += '_flip'
                result[bbox_name] = pred['bbox']
                result[score_name] = pred['score']
            else:
                mask_name = 'mask_pred_' + str(i)
                bbox_pred = feed_vars['bbox']
                #result.update({im.name: im})
                if 'flip' in im.name:
                    mask_name += '_flip'
                    bbox_pred = feed_vars['bbox_flip']
                mask_pred, bbox_pred = self.single_scale_eval(
                    body_feats, mask_name, rois, im_info, feed_vars['im_shape'],
                    spatial_scale, bbox_pred)
                result[mask_name] = mask_pred
        return result

    def single_scale_eval(self,
                          body_feats,
                          mask_name,
                          rois,
                          im_info,
                          im_shape,
                          spatial_scale,
                          bbox_pred=None):
        if not bbox_pred:
            if self.fpn is None:
                last_feat = body_feats[list(body_feats.keys())[-1]]
                roi_feat = self.roi_extractor(last_feat, rois)
            else:
                roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
            bbox_pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
                                                      im_shape)
            bbox_pred = bbox_pred['bbox']

        # share weight
        bbox_shape = fluid.layers.shape(bbox_pred)
        bbox_size = fluid.layers.reduce_prod(bbox_shape)
        bbox_size = fluid.layers.reshape(bbox_size, [1, 1])
        size = fluid.layers.fill_constant([1, 1], value=6, dtype='int32')
        cond = fluid.layers.less_than(x=bbox_size, y=size)

        mask_pred = fluid.layers.create_global_var(
            shape=[1],
            value=0.0,
            dtype='float32',
            persistable=False,
            name=mask_name)

        def noop():
            fluid.layers.assign(input=bbox_pred, output=mask_pred)

        def process_boxes():
            bbox = fluid.layers.slice(bbox_pred, [1], starts=[2], ends=[6])

            im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
            im_scale = fluid.layers.sequence_expand(im_scale, bbox)

            mask_rois = bbox * im_scale
            if self.fpn is None:
                last_feat = body_feats[list(body_feats.keys())[-1]]
                mask_feat = self.roi_extractor(last_feat, mask_rois)
                mask_feat = self.bbox_head.get_head_feat(mask_feat)
            else:
                mask_feat = self.roi_extractor(
                    body_feats, mask_rois, spatial_scale, is_mask=True)

            mask_out = self.mask_head.get_prediction(mask_feat, bbox)
            fluid.layers.assign(input=mask_out, output=mask_pred)

        fluid.layers.cond(cond, noop, process_boxes)
        return mask_pred, bbox_pred

    def _input_check(self, require_fields, feed_vars):
        for var in require_fields:
            assert var in feed_vars, \
                "{} has no {} field".format(feed_vars, var)

    def _inputs_def(self, image_shape):
        im_shape = [None] + image_shape
        # yapf: disable
        inputs_def = {
            'image':    {'shape': im_shape,  'dtype': 'float32', 'lod_level': 0},
            'im_info':  {'shape': [None, 3], 'dtype': 'float32', 'lod_level': 0},
            'im_id':    {'shape': [None, 1], 'dtype': 'int64',   'lod_level': 0},
            'im_shape': {'shape': [None, 3], 'dtype': 'float32', 'lod_level': 0},
            'gt_bbox':  {'shape': [None, 4], 'dtype': 'float32', 'lod_level': 1},
            'gt_class': {'shape': [None, 1], 'dtype': 'int32',   'lod_level': 1},
            'is_crowd': {'shape': [None, 1], 'dtype': 'int32',   'lod_level': 1},
            'gt_mask':  {'shape': [None, 2], 'dtype': 'float32', 'lod_level': 3}, # polygon coordinates
            'is_difficult': {'shape': [None, 1], 'dtype': 'int32', 'lod_level': 1},
        }
        # yapf: enable
        return inputs_def

    def build_inputs(self,
                     image_shape=[3, None, None],
                     fields=[
                         'image', 'im_info', 'im_id', 'gt_bbox', 'gt_class',
                         'is_crowd', 'gt_mask'
                     ],
                     multi_scale=False,
                     num_scales=-1,
                     use_flip=None,
                     use_dataloader=True,
                     iterable=False,
                     mask_branch=False):
        inputs_def = self._inputs_def(image_shape)
        fields = copy.deepcopy(fields)
        if multi_scale:
            ms_def, ms_fields = multiscale_def(image_shape, num_scales,
                                               use_flip)
            inputs_def.update(ms_def)
            fields += ms_fields
            self.im_info_names = ['image', 'im_info'] + ms_fields
            if mask_branch:
                box_fields = ['bbox', 'bbox_flip'] if use_flip else ['bbox']
                for key in box_fields:
                    inputs_def[key] = {
                        'shape': [None, 6],
                        'dtype': 'float32',
                        'lod_level': 1
                    }
                fields += box_fields
        feed_vars = OrderedDict([(key, fluid.data(
            name=key,
            shape=inputs_def[key]['shape'],
            dtype=inputs_def[key]['dtype'],
            lod_level=inputs_def[key]['lod_level'])) for key in fields])
        use_dataloader = use_dataloader and not mask_branch
        loader = fluid.io.DataLoader.from_generator(
            feed_list=list(feed_vars.values()),
            capacity=16,
            use_double_buffer=True,
            iterable=iterable) if use_dataloader else None
        return feed_vars, loader

    def train(self, feed_vars):
        return self.build(feed_vars, 'train')

    def eval(self, feed_vars, multi_scale=None, mask_branch=False):
        if multi_scale:
            return self.build_multi_scale(feed_vars, mask_branch)
        return self.build(feed_vars, 'test')

    def test(self, feed_vars, exclude_nms=False):
        assert not exclude_nms, "exclude_nms for {} is not support currently".format(
            self.__class__.__name__)
        return self.build(feed_vars, 'test')

感谢指教

jerrywgz commented 3 years ago

可以参考https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/ppdet/modeling/anchor_heads/rpn_head.py#L470 这里将不同输出的roi进行合并。这个操作可以根据分数将相同batch的roi合并在一起,如果直接concat的话就会导致lod信息不能正确的表示输出结果的batch信息

a2824256 commented 3 years ago

已解决,感谢