EdjeElectronics / TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

How to train a TensorFlow Object Detection Classifier for multiple object detection on Windows
Apache License 2.0
2.92k stars 1.3k forks source link

Faster RCNN resnet101 model classify objects correctly but bounding box plotting is incorrect. #487

Closed Amol0296 closed 4 years ago

Amol0296 commented 4 years ago

System Configurations:

  1. OS: Linux Ubuntu 18.04
  2. CUDA/cuDNN version: CUDA 10.1, Cuda compilation tool v9.1.85, Driver Version 430.50
  3. Graphics Card: Quadro M2000M 4GB
  4. Tensorflow Version: 2.0.0

Reference Used for Object Detection - EdjeElectronics

Data used for training: I've around 13,000 images for training with only 2 classes - Person and Vehicle. I've split this data in train-test ratio of 75:25.

Model tested (downloaded from model_zoo):

  1. faster_rcnn_inception_v2_coco
  2. faster_rcnn_resnet101_coco

Here is the config file:

# Faster R-CNN with Inception v2, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 2
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 1
            learning_rate: .0002
          }
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "../comp/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "../res/record/train.record"
  }
  label_map_path: "../comp/labelmap.pbtxt"
}

eval_config: {
  num_examples: 67
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "../res/record/test.record"
  }
  label_map_path: "../comp/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

Things to notice: I did the training process for 2 lakh iterations with both these models separately.

Problem: The bounding boxes detected on new set of images are not of good accuracy. Let's say if i have 2 vehicles and 1 person in the image, the no. of boxes is correct but the size of the box is drastically wrong.

Things I tried out

  1. I first thought the model went into Overfitting, so I explored tensroflow community and some stack overflow sites to understand what ovefitting is. I concluded that overfitting occurs basically due to

    • Large number of parameters, but I have only the bounding box co-ordinates as input parameters.
    • Model detects the training set accurately but does not detect new data.
    • This is not the case here because the model doesn't detect on the training data as well.
  2. Tried changing the regularizer values based on the playground exercise for my data

  3. Tried changing anchor box values referring here

None of these improved my results.