Data used for training:
I've around 13,000 images for training with only 2 classes - Person and Vehicle.
I've split this data in train-test ratio of 75:25.
# Faster R-CNN with Inception v2, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 2
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_v2'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 1
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "../comp/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "../res/record/train.record"
}
label_map_path: "../comp/labelmap.pbtxt"
}
eval_config: {
num_examples: 67
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "../res/record/test.record"
}
label_map_path: "../comp/labelmap.pbtxt"
shuffle: false
num_readers: 1
}
Things to notice:
I did the training process for 2 lakh iterations with both these models separately.
Problem:
The bounding boxes detected on new set of images are not of good accuracy. Let's say if i have 2 vehicles and 1 person in the image, the no. of boxes is correct but the size of the box is drastically wrong.
Things I tried out
I first thought the model went into Overfitting, so I explored tensroflow community and some stack overflow sites to understand what ovefitting is. I concluded that overfitting occurs basically due to
Large number of parameters, but I have only the bounding box co-ordinates as input parameters.
Model detects the training set accurately but does not detect new data.
This is not the case here because the model doesn't detect on the training data as well.
Tried changing the regularizer values based on the playground exercise for my data
System Configurations:
Reference Used for Object Detection - EdjeElectronics
Data used for training: I've around 13,000 images for training with only 2 classes - Person and Vehicle. I've split this data in train-test ratio of 75:25.
Model tested (downloaded from model_zoo):
Here is the config file:
Things to notice: I did the training process for 2 lakh iterations with both these models separately.
Problem: The bounding boxes detected on new set of images are not of good accuracy. Let's say if i have 2 vehicles and 1 person in the image, the no. of boxes is correct but the size of the box is drastically wrong.
Things I tried out
I first thought the model went into Overfitting, so I explored tensroflow community and some stack overflow sites to understand what ovefitting is. I concluded that overfitting occurs basically due to
Tried changing the regularizer values based on the playground exercise for my data
Tried changing anchor box values referring here
None of these improved my results.