Faster RCNN resnet101 model classify objects correctly but bounding box plotting is incorrect.

System Configurations:

OS: Linux Ubuntu 18.04
CUDA/cuDNN version: CUDA 10.1, Cuda compilation tool v9.1.85, Driver Version 430.50
Graphics Card: Quadro M2000M 4GB
Tensorflow Version: 2.0.0

Reference Used for Object Detection - EdjeElectronics

Data used for training: I've around 13,000 images for training with only 2 classes - Person and Vehicle. I've split this data in train-test ratio of 75:25.

Model tested (downloaded from model_zoo):

faster_rcnn_inception_v2_coco
faster_rcnn_resnet101_coco

Here is the config file:

# Faster R-CNN with Inception v2, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 2
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 1
            learning_rate: .0002
          }
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "../comp/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "../res/record/train.record"
  }
  label_map_path: "../comp/labelmap.pbtxt"
}

eval_config: {
  num_examples: 67
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "../res/record/test.record"
  }
  label_map_path: "../comp/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

Things to notice: I did the training process for 2 lakh iterations with both these models separately.

Problem: The bounding boxes detected on new set of images are not of good accuracy. Let's say if i have 2 vehicles and 1 person in the image, the no. of boxes is correct but the size of the box is drastically wrong.

Things I tried out

I first thought the model went into Overfitting, so I explored tensroflow community and some stack overflow sites to understand what ovefitting is. I concluded that overfitting occurs basically due to
- Large number of parameters, but I have only the bounding box co-ordinates as input parameters.
- Model detects the training set accurately but does not detect new data.
- This is not the case here because the model doesn't detect on the training data as well.
Tried changing the regularizer values based on the playground exercise for my data
Tried changing anchor box values referring here

None of these improved my results.

EdjeElectronics / TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

Faster RCNN resnet101 model classify objects correctly but bounding box plotting is incorrect. #487