CasiaFan / tensorflow_retinanet

RetinaNet with Focal Loss implemented by Tensorflow
121 stars 45 forks source link

KeyError: 8 #18

Open Rublins opened 4 years ago

Rublins commented 4 years ago

Help me to clear this KeyError :8

Traceback (most recent call last): File "train.py", line 184, in tf.app.run() File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\util\deprecation.py", line 272, in new_func return func(*args, *kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\legacy\trainer.py", line 292, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\slim\deployment\model_deploy.py", line 194, in create_clones outputs = model_fn(args, **kwargs) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\legacy\trainer.py", line 205, in _create_losses prediction_dict = detection_model.predict(images, true_image_shapes) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\meta_architectures\ssd_meta_arch.py", line 600, in predict preprocessed_inputs) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\models\retinanet_feature_extractor.py", line 135, in extract_features return [image_features[x] for x in range(self._min_level, self._max_level+1)] File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\models\retinanet_feature_extractor.py", line 135, in return [image_features[x] for x in range(self._min_level, self._max_level+1)] KeyError: 8

I have attached the config file used for training.

SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal

loss (a.k.a Retinanet).

See Lin et al, https://arxiv.org/abs/1708.02002

Trained on COCO, initialized from Imagenet classification checkpoint

Achieves 35.2 mAP on COCO14 minival dataset. Doubling the number of training

steps to 50k gets 36.9 mAP

This config is TPU compatible

model { ssd { inplace_batchnorm_update: true freeze_batchnorm: false num_classes: 10 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 2 } } image_resizer { fixed_shape_resizer { height: 640 width: 640 } } box_predictor { weight_shared_convolutional_box_predictor { depth: 256 class_prediction_bias_init: -4.6 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.0004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true, decay: 0.997, epsilon: 0.001, } } num_layers_before_predictor: 4 kernel_size: 3 } } feature_extractor { type: 'retinanet_50' min_depth: 0 depth_multiplier: 1.0 pad_to_multiple: 32 override_base_feature_extractor_hyperparams: true } loss { classification_loss { weighted_sigmoid_focal { alpha: 0.25 gamma: 2.0 } } localization_loss { weighted_smooth_l1 { } } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true normalize_loc_loss_by_codesize: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }

train_config: {

fine_tune_checkpoint: "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/model.ckpt"

batch_size: 1 sync_replicas: false startup_delay_steps: 0 replicas_to_aggregate: 2 num_steps: 5000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_crop_image { min_object_covered: 0.0 min_aspect_ratio: 0.75 max_aspect_ratio: 3.0 min_area: 0.75 max_area: 1.0 overlap_thresh: 0.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: .04 total_steps: 5000 warmup_learning_rate: .013333 warmup_steps: 2000 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 unpad_groundtruth_tensors: false }

train_input_reader: { tf_record_input_reader { input_path: "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ORI_Image_300/train1.record" } label_map_path: "D:/RUBIN/RESEARCH/Tensorflow1/models-master/research/object_detection/Retina/Ret_Training/labelmap.pbtxt" }

eval_config: { metrics_set: "coco_detection_metrics" num_examples: 1424 num_visualizations:1424 max_evals: 1 }

eval_input_reader: { tf_record_input_reader { input_path: "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ORI_Image_300/test1.record" } label_map_path: "D:/RUBIN/RESEARCH/Tensorflow1/models-master/research/object_detection/Retina/Ret_Training/labelmap.pbtxt" shuffle: false num_readers: 1 }

Thank You.

CasiaFan commented 4 years ago

Could you try to run model_main.py instead of trainer.py and see the results first? I'm not sure about the issue since it seems that the max_level is given 8. @Rublins

Rublins commented 4 years ago

@CasiaFan.. Thank You, I solved the above issue.. Now i got Value error.. Kindly help me to sort out this issue.

python model_main.py --logtostderr --train_dir=Retina/Ret_Training --pipeline_config_path=Retina/Ret_Training/retinanet_50_train1.config WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Using temporary folder as model directory: C:\Users\SATHIE~1\AppData\Local\Temp\tmp_pb1_uxg WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x000001656F97F2F0>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\builders\dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.parallel_interleave(...). WARNING:tensorflow:From C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\ops\sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead. WARNING:tensorflow:From D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). p2 shape: (4, 80, 80, 256) p3 shape: (4, 40, 40, 512) p4 shape: (4, 20, 20, 1024) p5 shape: (4, 10, 10, 2048) l3 shape: (4, 40, 40, 256) l4 shape: (4, 20, 20, 256) l5 shape: (4, 10, 10, 256) p3 shape: (4, 40, 40, 256) p4 shape: (4, 20, 20, 256) Traceback (most recent call last): File "model_main.py", line 109, in tf.app.run() File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\training.py", line 471, in train_and_evaluate return executor.run() File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\training.py", line 610, in run return self.run_local() File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\training.py", line 711, in run_local saving_listeners=saving_listeners) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1237, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "C:\Users\Sathiesh Kumar\AppData\Local\Continuum\anaconda3\envs\retina\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1195, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\model_lib.py", line 308, in model_fn features[fields.InputDataFields.true_image_shape]) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\meta_architectures\ssd_meta_arch.py", line 609, in predict im_width=image_shape[2]) File "D:\RUBIN\RESEARCH\Tensorflow1\models-master\research\object_detection\core\anchor_generator.py", line 105, in generate raise ValueError('Number of feature maps is expected to equal the length ' ValueError: Number of feature maps is expected to equal the length of num_anchors_per_location.

Retinanet.py

import tensorflow as tf
import math
from object_detection.utils.shape_utils import combined_static_and_dynamic_shape

# tf.enable_eager_execution()
BN_PARAMS = {"bn_decay": 0.997,
             "bn_epsilon": 1e-4}

# define number of layers of each block for different architecture
RESNET_ARCH_BLOCK = {"resnet50": [3, 4, 6, 3],
                     "resnet101": [3, 4, 23, 3]}

def nearest_neighbor_upsampling(input_tensor, scale):
    """Nearest neighbor upsampling implementation.
    NOTE: See TensorFlow Object Detection API uitls.ops
    Args:
        input_tensor: A float32 tensor of size [batch, height_in, width_in, channels].
        scale: An integer multiple to scale resolution of input data.
    Returns:
        upsample_input: A float32 tensor of size [batch, height_in*scale, width_in*scale, channels].
    """
    with tf.name_scope('nearest_neighbor_upsampling'):
        (batch_size, h, w, c) = combined_static_and_dynamic_shape(input_tensor)
        output_tensor = tf.reshape(input_tensor, [batch_size, h, 1, w, 1, c]) * tf.ones(
                [1, 1, scale, 1, scale, 1], dtype=input_tensor.dtype)
        return tf.reshape(output_tensor, [batch_size, h*scale, w*scale, c])

def conv2d_same(inputs, depth, kernel_size, strides, scope=None):
    with tf.name_scope(scope, None):
        if strides == 1:
            return tf.layers.conv2d(inputs, depth, kernel_size, padding='SAME')
        else:
            pad_total = kernel_size - 1
            pad_beg = pad_total // 2
            pad_end = pad_total - pad_beg
            inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
            return tf.layers.conv2d(inputs,
                                    depth,
                                    kernel_size,
                                    strides=strides,
                                    padding='VALID',
                                    use_bias=False,
                                    kernel_initializer=tf.variance_scaling_initializer())

def bn_with_relu(inputs, is_training, relu=True, init_zero=False, name=None):
    if not init_zero:
        gamma_init = tf.ones_initializer()
    else:
        gamma_init = tf.zeros_initializer()
    inputs = tf.layers.batch_normalization(inputs,
                                           training=is_training,
                                           momentum=BN_PARAMS["bn_decay"],
                                           epsilon=BN_PARAMS["bn_epsilon"],
                                           scale=True,
                                           fused=True,
                                           gamma_initializer=gamma_init,
                                           name=name)
    if relu:
        inputs = tf.nn.relu(inputs)
    return inputs

def bottleneck(inputs, depth, strides, is_training, projection=False, scope=None):
    """Bottleneck residual unit variant with BN after convolutions
       When putting together 2 consecutive ResNet blocks that use this unit,
       one should use stride =2 in the last unit of first block

    Args:
        inputs: A tensor of size [batchsize, height, width, channels] (after BN)
        depth: The depth of the block unit output
        strides: the ResNet unit's stride. Determines the amount of downsampling of
            the units output compared to its input
        is_training: indicate training state for BN layer
        projection: if this block will use a projection. True for first block in block groups
        scope: Optional variable scope

    Returns:
        The ResNet unit output
    """
    with tf.variable_scope(scope, 'bottleneck', [inputs]) as sc:
        # shortcut connection
        shortcut = inputs
        depth_out = depth * 4
        if projection:
            shortcut = conv2d_same(shortcut, depth_out, kernel_size=1, strides=strides, scope='shortcut')
            shortcut = bn_with_relu(shortcut, is_training, relu=False)
        # layer1
        residual = conv2d_same(inputs, depth, kernel_size=1, strides=1, scope='conv1')
        residual = bn_with_relu(residual, is_training)
        # layer 2
        residual = conv2d_same(residual, depth, kernel_size=3, strides=strides, scope='conv2')
        residual = bn_with_relu(residual, is_training)
        # layer 3
        residual = conv2d_same(residual, depth_out, kernel_size=1, strides=1, scope='conv3')
        residual = bn_with_relu(residual, is_training, relu=False, init_zero=True)
        output = shortcut + residual
        return tf.nn.relu(output)

def stack_bottleneck(inputs, layers, depth, strides, is_training, scope=None):
    """ Stack bottleneck planes

    This function creates scopes for the ResNet in the form of 'block_name/plane_1, block_name/plane_2', etc.
    Args:
        layers: number of layers in this block
    """
    with tf.variable_scope(scope, 'block', [inputs]) as sc:
        inputs = bottleneck(inputs, depth, strides=strides, is_training=is_training, projection=True)
        for i in range(1, layers):
            layer_scope = "unit_{}".format(i)
            inputs = bottleneck(inputs, depth, strides=1, is_training=is_training, scope=layer_scope)
    return inputs

def retinanet_fpn(inputs,
                  block_layers,
                  depth=256,
                  is_training=True,
                  scope=None):
    """
    Generator for RetinaNet FPN models. A small modification of initial FPN model for returning layers
        {P3, P4, P5, P6, P7}. See paper Focal Loss for Dense Object Detection. arxiv: 1708.02002

        P2 is discarded and P6 is obtained via 3x3 stride-2 conv on c5; P7 is computed by applying ReLU followed by
        3x3 stride-2 conv on P6. P7 is to improve large object detection

    Returns:
        5 feature map tensors: {P3, P4, P5, P6, P7}
    """
    with tf.variable_scope(scope, 'retinanet_fpn', [inputs]) as sc:
        net = conv2d_same(inputs, 64, kernel_size=7, strides=2, scope='conv1')
        net = bn_with_relu(net, is_training)
        net = tf.layers.max_pooling2d(net, pool_size=3, strides=2, padding='SAME', name='pool1')
        # Bottom up
        # block 1, down-sampling is done in conv3_1, conv4_1, conv5_1
        p2 = stack_bottleneck(net, layers=block_layers[0], depth=64, strides=1, is_training=is_training)
        # block 2
        p3 = stack_bottleneck(p2, layers=block_layers[1], depth=128, strides=2, is_training=is_training)
        # block 3
        p4 = stack_bottleneck(p3, layers=block_layers[2], depth=256, strides=2, is_training=is_training)
        # block 4
        p5 = stack_bottleneck(p4, layers=block_layers[3], depth=512, strides=2, is_training=is_training)
        # lateral layer
        l3 = tf.layers.conv2d(p3, filters=depth, kernel_size=1, strides=1, name='l3', padding='SAME')
        l4 = tf.layers.conv2d(p4, filters=depth, kernel_size=1, strides=1, name='l4', padding='SAME')
        l5 = tf.layers.conv2d(p5, filters=depth, kernel_size=1, strides=1, name='l5', padding='SAME')
        print ('p2 shape:', p2.get_shape())
        print ('p3 shape:', p3.get_shape())
        print ('p4 shape:', p4.get_shape())
        print ('p5 shape:', p5.get_shape())
        print ('l3 shape:', l3.get_shape())
        print ('l4 shape:', l4.get_shape())

        print ('l5 shape:', l5.get_shape())
        # Top down
        p4 = nearest_neighbor_upsampling(l5, 2) + l4
        p3 = nearest_neighbor_upsampling(p4, 2) + l3
        print ('p3 shape:', p3.get_shape())
        print ('p4 shape:', p4.get_shape())
        # add post-hoc conv layers
        p3 = tf.layers.conv2d(p3, filters=depth, kernel_size=3, strides=1, padding='SAME', name='post-hoc-d3')
        p4 = tf.layers.conv2d(p4, filters=depth, kernel_size=3, strides=1, padding='SAME', name='post-hoc-d4')
        p5 = tf.layers.conv2d(l5, filters=depth, kernel_size=3, strides=1, padding='SAME', name='post-hoc-d5')
        # coarse layer: 6, 7
        # p6

        p6 = tf.layers.conv2d(p5, filters=depth, kernel_size=3, strides=2, name='conv6', padding='SAME')
        p6 = tf.nn.relu(p6)
        # P7
        p7 = tf.layers.conv2d(p6, filters=depth, kernel_size=3, strides=2, name='conv7', padding='SAME')
        # add normalization to each layer
        features = {3: p3,
                    4: p4,
                    5: l5,
                    6: p6,
                    7: p7}
        for layer in features:
            features[layer] = tf.layers.batch_normalization(features[layer],
                                                            training=is_training,
                                                            momentum=BN_PARAMS["bn_decay"],
                                                            epsilon=BN_PARAMS["bn_epsilon"],
                                                            center=True,
                                                            scale=True,
                                                            fused=True,
                                                            name='p{}-bn'.format(layer))
        return features

def share_weight_class_net(inputs, level, num_classes, num_anchors_per_loc, num_layers_before_predictor=4, is_training=True):
    """
    net for predicting class labels
    NOTE: Share same weights when called more then once on different feature maps
    Args:
        inputs: feature map with shape (batch_size, h, w, channel)
        level: which feature map
        num_classes: number of predicted classes
        num_anchors_per_loc: number of anchors at each spatial location in feature map
        num_layers_before_predictor: number of the additional conv layers before the predictor.
        is_training: is in training or not
    returns:
        feature with shape (batch_size, h, w, num_classes*num_anchors)
    """
    for i in range(num_layers_before_predictor):
        inputs = tf.layers.conv2d(inputs, filters=256, kernel_size=3, strides=1,
                                  kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                                  bias_initializer=tf.zeros_initializer(),
                                  padding="SAME",
                                  name='class_{}'.format(i))
        inputs = bn_with_relu(inputs, is_training, relu=True, init_zero=False, name="class_{}_bn_level_{}".format(i, level))
    outputs = tf.layers.conv2d(inputs,
                               filters=num_classes*num_anchors_per_loc,
                               kernel_size=3,
                               bias_initializer=tf.constant_initializer(-math.log((1 - 0.01) / 0.01)),
                               kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                               padding="SAME",
                               name="class_pred")
    return outputs

def share_weight_box_net(inputs, level, num_anchors_per_loc, num_layers_before_predictor=4, is_training=True):
    """
    Similar to class_net with output feature shape (batch_size, h, w, num_anchors*4)
    """
    for i in range(num_layers_before_predictor):
        inputs = tf.layers.conv2d(inputs, filters=256, kernel_size=3, strides=1,
                                  bias_initializer=tf.zeros_initializer(),
                                  kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                                  padding="SAME",
                                  name='box_{}'.format(i))
        inputs = bn_with_relu(inputs, is_training, relu=True, init_zero=False, name="box_{}_bn_level_{}".format(i, level))
    outputs = tf.layers.conv2d(inputs,
                               filters=4*num_anchors_per_loc,
                               kernel_size=3,
                               kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                               padding="SAME",
                               name="box_pred")
    return outputs

def retinanet(images, num_classes, num_anchors_per_loc, resnet_arch='resnet50', is_training=True):
    """
    Get box prediction features and class prediction features from given images
    Args:
        images: input batch of images with shape (batch_size, h, w, 3)
        num_classes: number of classes for prediction
        num_anchors_per_loc: number of anchors at each feature map spatial location
        resnet_arch: name of which resnet architecture used
        is_training: indicate training or not
    return:
        prediciton dict: holding following items:
            box_predictions tensor from each feature map with shape (batch_size, num_anchors, 4)
            class_predictions_with_bg tensor from each feature map with shape (batch_size, num_anchors, num_class+1)
            feature_maps: list of tensor of feature map
    """
    assert resnet_arch in list(RESNET_ARCH_BLOCK.keys()), "resnet architecture not defined"
    with tf.variable_scope('retinanet'):
        batch_size = combined_static_and_dynamic_shape(images)[0]
        features = retinanet_fpn(images, block_layers=RESNET_ARCH_BLOCK[resnet_arch], is_training=is_training)
        class_pred = []
        box_pred = []
        feature_map_list = []
        num_slots = num_classes + 1
        with tf.variable_scope('class_net', reuse=tf.AUTO_REUSE):
            for level in features.keys():
                class_outputs = share_weight_class_net(features[level], level,
                                                       num_slots,
                                                       num_anchors_per_loc,
                                                       is_training=is_training)
                class_outputs = tf.reshape(class_outputs, shape=[batch_size, -1, num_slots])
                class_pred.append(class_outputs)
                feature_map_list.append(features[level])
        with tf.variable_scope('box_net', reuse=tf.AUTO_REUSE):
            for level in features.keys():
                box_outputs = share_weight_box_net(features[level], level, num_anchors_per_loc, is_training=is_training)
                box_outputs = tf.reshape(box_outputs, shape=[batch_size, -1, 4])
                box_pred.append(box_outputs)
        return get(box_pred=tf.concat(box_pred, axis=1),
                    cls_pred=tf.concat(class_pred, axis=1),
                    feature_map_list=feature_map_list)

Config file

# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from Imagenet classification checkpoint

# Achieves 35.2 mAP on COCO14 minival dataset. Doubling the number of training
# steps to 50k gets 36.9 mAP

# This config is TPU compatible

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 10
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 2
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 256
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.0004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true,
            decay: 0.997,
            epsilon: 0.001,
          }
        }
        num_layers_before_predictor: 4
        kernel_size: 3
      }
    }
    feature_extractor {
      type: 'retinanet_50'
      min_depth: 0
      depth_multiplier: 1.0
      pad_to_multiple: 4
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  #fine_tune_checkpoint: "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/model.ckpt"
  batch_size: 4
  sync_replicas: false
  startup_delay_steps: 0
  replicas_to_aggregate: 2
  num_steps: 5000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .04
          total_steps: 5000
          warmup_learning_rate: .013333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  tf_record_input_reader {
    input_path:  "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ORI_Image_300/train1.record"
  }
  label_map_path: "D:/RUBIN/RESEARCH/Tensorflow1/models-master/research/object_detection/Retina/Ret_Training/labelmap.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1424
  num_visualizations:1424
  max_evals: 1
}

eval_input_reader: {
  tf_record_input_reader {
    input_path:  "D:/RUBIN/RESEARCH/Tensorflow1/models/research/object_detection/ORI_Image_300/test1.record"
  }
  label_map_path: "D:/RUBIN/RESEARCH/Tensorflow1/models-master/research/object_detection/Retina/Ret_Training/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

Thank you.

CasiaFan commented 4 years ago

I test this code using tf 1.14 but things run okay. This error should a mismatch between the number of feature maps (5 layers) and the number of anchor grid settings (which is max_level-min_level+1). So first, could you check the tf model api version? Anyway, I find your input image size is 640, but in that case, the p2 layer shape should be 160x160 or 1/4 of the input width and height while your result is 80. I am not sure where is the cause of this problem.

@Rublins

Rublins commented 4 years ago

Thank you @CasiaFan... I changed the input size to 320 x 320 and now it works fine for ResNet-50..

Rublins commented 4 years ago

@CasiaFan can you suggest any suitable pre-trained model weights for RetinaNet-101 (ResNet-101 as a feature extractor). Thank You

CasiaFan commented 4 years ago

@Rublins If you are using my custom code for training, I'm afraid there would be no pre-trained resnet101 weights due to the node name mismatch. But if you are using the official TF object detection api, this model weights trained on Open Images Dataset should meet your need.