EdjeElectronics / TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

How to train a TensorFlow Object Detection Classifier for multiple object detection on Windows
Apache License 2.0
2.92k stars 1.3k forks source link

Error while training custom model with ssd_inception_v2_coco #441

Open shABanty opened 4 years ago

shABanty commented 4 years ago

(tensorflow_cpu)C:\Users\AB\Tensorflow\workplace\training_demo>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config

WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version. Instructions for updating: Use object_detection/model_main.py. WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.create_global_step WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\object_detection\core\preprocessor.py:1240: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:737: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-02-07 01:57:41.478395: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:tensorflow:Restoring parameters from C:\Users\AB\Tensorflow\workplace\training_demo\pre-trained-model\model.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path training/model.ckpt INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 2020-02-07 01:58:16.645162: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 480 of 2048 2020-02-07 01:58:26.795941: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 633 of 2048 2020-02-07 0

shABanty commented 4 years ago

I'm using the model here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

and config here: https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_inception_v2_coco.config

shABanty commented 4 years ago

config looks like this:

model { ssd { num_classes: 10 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 reduce_boxes_in_lowest_layer: true } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 3 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } } } } feature_extractor { type: 'ssd_inception_v2' min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } override_base_feature_extractor_hyperparams: true } loss { classification_loss { weighted_sigmoid { } } localization_loss { weighted_smooth_l1 { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }

train_config: { batch_size: 10 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "C:\Users\AB\Tensorflow\workplace\training_demo\pre-trained-model\model.ckpt" from_detection_checkpoint: true

Note: The below line limits the training process to 200K steps, which we

# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
fine_tune_checkpoint:"C:\\Users\\AB\\Tensorflow\\workplace\\training_demo\\pre-trained-model\\model.ckpt"
fine_tune_checkpoint_type: "detection"
data_augmentation_options {
    random_horizontal_flip {
    }
}
data_augmentation_options {
    ssd_random_crop {
    }
}

}

train_input_reader: { tf_record_input_reader { input_path: "C:\Users\AB\Tensorflow\workplace\training_demo\annotations\train.record" } label_map_path: "C:\Users\AB\Tensorflow\workplace\training_demo\annotations\label_map.pbtxt" }

eval_config: { num_examples: 8000

Note: The below line limits the evaluation process to 10 evaluations.

# Remove the below line to evaluate indefinitely.
max_evals: 10

}

eval_input_reader: { tf_record_input_reader { input_path: "C:\Users\AB\Tensorflow\workplace\training_demo\annotations\test.record" } label_map_path: "C:\Users\AB\Tensorflow\workplace\training_demo\annotations\label_map.pbtxt" shuffle: false num_readers: 1 }

Ahmadzia307 commented 4 years ago

First Change your path in the config with forward slash from : C:\Users\AB\Tensorflow\workplace\training_demo\annotations\train.record to : C:/Users/AB/Tensorflow/workplace/training_demo/annotations/train.record

shABanty commented 4 years ago

@Ahmadzia307 Thanks for your reply. I've changed the path format. I'm now using pretrain model- ssd inception v2 coco instead and I don't have the error like the one above, but my training does not start and no loss value is shown as follows.

(tensorflow_cpu) C:\Users\AB\Tensorflow\workplace\training_demo>python train.py \ --logtostderr \ --train_dir=train \ --pipeline_config_path=training\ssd_inception_v2_coco.config WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version. Instructions for updating: Use object_detection/model_main.py. WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.create_global_step WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\object_detection\core\preprocessor.py:1240: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From C:\Users\AB\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:737: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-02-08 12:34:31.414876: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:tensorflow:Restoring parameters from C:/Users/AB/Tensorflow/workplace/training_demo/pre-trained-model/model.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path train\model.ckpt INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 2020-02-08 12:35:02.919828: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 480 of 2048 2020-02-08 12:35:12.815315: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 673 of 2048 2020-02-08 12:35:22.678793: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 800 of 2048 2020-02-08 12:35:32.710125: I T:\src\github\tensorflow\tensorflow\core\kernels\data\shuffle_dataset_op.cc:95] Filling up shuffle buffer (this may take a while): 1019 of 2048

shABanty commented 4 years ago

config

model { ssd { num_classes: 10 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 reduce_boxes_in_lowest_layer: true } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 3 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } } } } feature_extractor { type: 'ssd_inception_v2' min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } override_base_feature_extractor_hyperparams: true } loss { classification_loss { weighted_sigmoid { } } localization_loss { weighted_smooth_l1 { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }

train_config: { batch_size: 1 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "C:/Users/AB/Tensorflow/workplace/training_demo/pre-trained-model/model.ckpt" from_detection_checkpoint: true

Note: The below line limits the training process to 200K steps, which we

# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
fine_tune_checkpoint:"C:/Users/AB/Tensorflow/workplace/training_demo/pre-trained-model/model.ckpt"
fine_tune_checkpoint_type: "detection"
data_augmentation_options {
    random_horizontal_flip {
    }
}
data_augmentation_options {
    ssd_random_crop {
    }
}

}

train_input_reader: { tf_record_input_reader { input_path: "C:/Users/AB/Tensorflow/workplace/training_demo/annotations/train.record" } label_map_path: "C:/Users/AB/Tensorflow/workplace/training_demo/annotations/label_map.pbtxt" }

eval_config: { num_examples: 8000

Note: The below line limits the evaluation process to 10 evaluations.

# Remove the below line to evaluate indefinitely.
max_evals: 10

}

eval_input_reader: { tf_record_input_reader { input_path: "C:/Users/AB/Tensorflow/workplace/training_demo/annotations/test.record" } label_map_path: "C:/Users/AB/Tensorflow/workplace/training_demo/annotations/label_map.pbtxt" shuffle: false num_readers: 1 }

YuanTG commented 4 years ago

Hi, Do you solve this problem? i have the same problem

Petros626 commented 2 years ago

@shABanty please do people the favor, not to post the whole config file or something, because if you gave them the link, they can see it themselves. Did you finally find a solution for it?