mAP decrease by 20% for object detection with SSD MobileNet V2 model on TPU

QING0304 commented 4 years ago

Hello,

I trained a SSD MobileNet V2 model on my own datasets for object detection. The mAP of .pb model that evaluated on PC is 72%. However, when I evaluated the .tflite model on Android and compiled .tflite model on TPU, the mAP both decreased to about 51%.

First, I referenced https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md to do quantization-aware training on SSD MobileNet V2 model, with pipeline config https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_quantized_300x300_coco.config and pre-trained model http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz. The parameter _quantdelay was set to 0 that inspired by https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/quantize/create_training_graph. Whole training process run on RTX 2080Ti GPU with Tensorflow 1.15 code.

I converted ckpt to tflite by using the following method:

python export_tflite_ssd_graph.py \
                --pipeline_config_path=xxx/pipeline.config \
                --trained_checkpoint_prefix=xxx/model.ckpt-40000 \
                --output_directory=xxx/tflite \
               --max_detections=100 \
               --add_postprocessing_op=true \

tflite_convert \
    --output_file=xxx/yyy.tflite \
    --graph_def_file=xxx/tflite/tflite_graph.pb \
    --output_format=TFLITE \
    --input_shape=1,300,300,3 \
    --input_arrays="normalized_input_image_tensor" \
    --output_arrays="TFLite_Detection_PostProcess","TFLite_Detection_PostProcess:1","TFLite_Detection_PostProcess:2","TFLite_Detection_PostProcess:3" \
    --inference_input_type=QUANTIZED_UINT8 \
    --inference_type=QUANTIZED_UINT8 \
    --std_dev_values=128 \
    --mean_values=128 \
    --change_concat_input_ranges=false \
    --default_ranges_min=0 \
    --max_detections=100 \
    --default_ranges_max=6 \
    --allow_custom_ops

Finally, I compiled .tflite model into the model that is compatible with the Edge TPU by using method https://coral.ai/docs/edgetpu/compiler/#usage. The evaluate code was rewritten from https://github.com/google-coral/edgetpu/blob/master/examples/object_detection.py

That really makes me confused why the mAP decreased by 20% on Android and TPU. Is there any problem in the process that I did above?

DLMasterCat commented 4 years ago

Hi, May I ask what the version of edgetpu_compiler and run time you used Thanks!

Namburger commented 4 years ago

@QING0304 if mAP dropped from .pb -> .tflite, this seems like a tflite conversion issue rather than the compiler, the tensorflow team would give you a more appropriate answer for this. Although ssd mobilenet is proven, so I'm very surprised to see this issue. May I know the full command you use to train the model also (any changes to num steps)?

QING0304 commented 4 years ago

Hi, May I ask what the version of edgetpu_compiler and run time you used Thanks!

@DLMasterCat Hi, edgetpu_compiler version: 14.1.317412892 version of edgetpu api on TPU: 2.14.1 version of tflite-runtime on TPU: 2.1.0.post1

Any suggestions would be appreciated!

QING0304 commented 4 years ago

@QING0304 if mAP dropped from .pb -> .tflite, this seems like a tflite conversion issue rather than the compiler, the tensorflow team would give you a more appropriate answer for this. Although ssd mobilenet is proven, so I'm very surprised to see this issue. May I know the full command you use to train the model also (any changes to num steps)?

@Namburger Hi, I used this script https://github.com/tensorflow/models/blob/master/research/object_detection/legacy/train.py for training the model and set parameters --pipeline_config_path and --train_dir to corresponding path.

Content of pipeline.config is as follows:

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 16
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_pixel_value_scale {
    }
  }
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.01
          decay_steps: 4000
          decay_factor: 0.5
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/ext/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt"
  from_detection_checkpoint: true
  num_steps: 40000
  load_all_detection_checkpoint_vars: true
}

train_input_reader {
  label_map_path: "/ext/SSD.pbtxt"
  tf_record_input_reader {
    input_path: "/ext/xxx.tfrecords"
  }
}

#eval_config: {
#  num_examples: 550
#}
#
#eval_input_reader: {
#  tf_record_input_reader {
#    input_path: ""
#  }
#  label_map_path: ""
#  shuffle: false
#  num_readers: 1
#}

graph_rewriter {
  quantization {
    delay: 0
    weight_bits: 8
    activation_bits: 8
  }
}

Namburger commented 4 years ago

Oh, could you try model_main.py instead? legacy/main.py is deprecated.

python3 object_detection/model_main.py --model_dir <path> --pipeline_config <path>

QING0304 commented 4 years ago

Oh, could you try model_main.py instead? legacy/main.py is deprecated.
python3 object_detection/model_main.py --model_dir <path> --pipeline_config <path>

Thanks for suggestion! I will try it and share the result as soon as possible.

Namburger commented 4 years ago

Also, I wonder if these flags were necessary, I just realized this?

    --default_ranges_min=0 \
    --default_ranges_max=6 \

usually that's needed if you trained without the quantization re-writer, wonder what would happens if you take that off?

QING0304 commented 4 years ago

@Namburger Thanks for reply again!

Following your suggestions, I tried to use model_main.py to train MobileNet V2 model with the same pipeline.config and take off --default_ranges_min and --default_ranges_max when convert .pb to .tflite. The result is the same as before, the mAP still decreased by about 20% on Android and TPU.

In addition, I used random_adjust_brightness and random_pixel_value_scaledata augementation during training because of the feature of my own dataset. Is it possible that something wrong with it on TPU?

Namburger commented 4 years ago

@QING0304 Well, you see there are 2 steps from getting your model from a tensorflow graph file to edgetpu compatible:

inference_graph.pb -> (1) tflite_converter -> model.tflite (cpu) -> (2) edgetpu_compiler -> model_edgetpu.tflite (edgetpu)

The way I understood the problem is that the model's mAP decreased about 20% after step 1, correct? In that case you should be seeing the issue in the model.tflite (cpu) too, could you try evaluating that model?

YijinLiu commented 3 years ago

I am experiencing a similar issue. I trained a ssd mobilenetv2 model using research/object_detection/model_main.py. The tflite model works fine on CPU. However the edgetpu version produces much worse results. I used the following command to convert to tflite:

tflite_convert --graph_def_file=tflite_graph.pb --output_file=detect.tflite \
             --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor \
             --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
             --inference_type=FLOAT --allow_custom_ops

And the following script to convert to edgetpu model:

converter = lite.TFLiteConverter.from_frozen_graph(                                                         
        "tflite_graph.pb",                                                           
        ["normalized_input_image_tensor"],                                                                      
        ["TFLite_Detection_PostProcess",                                                                        
         "TFLite_Detection_PostProcess:1",                                                                      
         "TFLite_Detection_PostProcess:2",                                                                      
         "TFLite_Detection_PostProcess:3"],                                                                     
        {"normalized_input_image_tensor": [1, 300, 300, 3]})                                   
    converter.allow_custom_ops = True                                                                                  
    converter.optimizations = [lite.Optimize.DEFAULT]                       
    converter.output_format = lite_constants.TFLITE                                                                                 
    def _representative_data_gen():                                                                             
        for example in tfrecord.parse([train.tfrecord-00000-of-00100']):                     
            encoded = example["image/encoded"].numpy()                                                          
            image = tf.io.decode_jpeg(encoded, channels=3)                                                      
            image = tf.image.resize(image, [flags.width, flags.height])                                         
            image = tf.cast(image, tf.float32)                                                                  
            image = image / 255.                                                                                
            image = tf.expand_dims(image, 0)                                                                    
            yield [image]                                                                                       
    converter.representative_dataset = _representative_data_gen                                                 
    tflite_model_quant = converter.convert()

YijinLiu commented 3 years ago

OK, I found the issue.

image = image / 255.

should be

image = image / 255. - 0.5

hjonnala commented 3 years ago

@QING0304 are you still having any issues here?

google-coral-bot[bot] commented 2 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-coral-bot[bot] commented 2 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

google-coral / edgetpu

mAP decrease by 20% for object detection with SSD MobileNet V2 model on TPU #191