CasiaFan / SSD_EfficientNet

SSD using TensorFlow object detection API with EfficientNet backbone
63 stars 10 forks source link

got min_feature_level error when run train the model #5

Open PythonImageDeveloper opened 5 years ago

PythonImageDeveloper commented 5 years ago

Hi @CasiaFan @wulungching, My tensorflow version is 1.14.0 binary. 1 - I modify the model.builder.py to :

SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
    'ssd_efficientnet': SSDEfficientNetFeatureExtractor,
    'ssd_efficientnet_fpn': SSDEfficientNetFPNFeatureExtractor,
......

2 - Put efficientnet.py and efficient_feature_extractor.py under object_detection/models directory 3- replace your ssd.proto with orginal ssd.proto 4.1 - when I run protoc object_detection/protos/ssd.proto --python_out=. I got this output:

object_detection/protos/ssd.proto:164:3: Expected "required", "optional", or "repeated".
object_detection/protos/ssd.proto:164:12: Expected field name.

4.2 - when I run./bin/protoc object_detection/protos/*.proto --python_out=. ,It's OK and I didn't got anythings.

5- I modify original ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config to accordingly to your modification above. https://github.com/CasiaFan/SSD_EfficientNet/issues/2 6- when I runpython3 model_main.py –alsologtostderr, I got this error:

    self._MergeField(tokenizer, sub_message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 730, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 12:7 : Message type "object_detection.protos.SsdFeatureExtractor" has no field named "network_version".

then I comment the network_version in config file and run python3 model_main.py –alsologtostderr , I got this error :

  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 730, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 13:7 : Message type "object_detection.protos.SsdFeatureExtractor" has no field named "min_feature_level".

when I run python object_detection/builders/model_builder_test.py for testing, I didn't got any error and that's Ok.

wulungching commented 5 years ago

I think your protoc version is 2.6. You could follow this.

CasiaFan commented 5 years ago

@PythonImageDeveloper My protoc version is 3.5.1. Note protoc 2 and protoc 3 differ a lot.

PythonImageDeveloper commented 5 years ago

@wulungching @CasiaFan I followed this command, in your opinion, I change the protoc version to 3.5.1?

Manual protobuf-compiler installation and usage** If you are on linux:

**Download and install the 3.0 release of protoc, then unzip the file.

# From tensorflow/models/research/
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
unzip protobuf.zip
Run the compilation process again, but use the downloaded version of protoc

# From tensorflow/models/research/
./bin/protoc object_detection/protos/*.proto --python_out=.
CasiaFan commented 5 years ago

@PythonImageDeveloper Yes, protoc 3.5.1could do it as well.

PythonImageDeveloper commented 5 years ago

@CasiaFan why you use TensorFlow 1.4? Is it possible with newer version? and I must git clone Tensorflow models repository branch 1.4 or new master branch? Is it possible to expand this configure for ssdlite_efficientnet?

CasiaFan commented 5 years ago

@PythonImageDeveloper TF 1.4 is the latest stable version and I install it using pip command. Since the only difference between ssdlite and vanilla ssd is its prediction head, integration with ssdlite predictor configuration into config file should also be workable.

PythonImageDeveloper commented 5 years ago

@CasiaFan The latest stable version is 1.14. Tensorflow Versions If I use ssdlite_mobilnetv2_coco.config for ssdlite_efficientnet.config, It should be workable? And in your opinion, Is it possible to use the pre-trained efficient-net for ssd_efficientnetconfig here? If so, How? And in your opinion, Is well work with the protoc version 3.6.1?

PythonImageDeveloper commented 5 years ago

@CasiaFan My framework versions: tensorflow: 1.14.0 cuda : 10.0 protoc : 3.5.1

I followed your commad and changes files, When I run python3 model_main.py –alsologtostderr in the object_detection directory, I got this error:

  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/layers/convolutional.py", line 192, in build
    self.rank + 2))
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1050, in __init__
    filter_shape[num_spatial_dims]))
ValueError: number of input channels does not match corresponding dimension of filter, 8 != 32
CasiaFan commented 5 years ago

@PythonImageDeveloper protoc 3.x version should meet our prerequisite. If you have efficientnet backbone pre-trained weights, modifying these lines in your configfile:

fine_tune_checkpoint: "/path/to/pretrained_ckpt"
from_detection_checkpoint: false

But if you have a complete detection pre-trained weights including the prediction head, then turn on from_detection_checkpoint to true

As for your issue, it seems to be related to inconsistency between feature input dimension and filter. Could you provide a more detailed log including traces indicating which line in our custom script produces it?

PythonImageDeveloper commented 5 years ago

@CasiaFan

(.venv) mm@mm:~/API-TF2/models/research/object_detection$ python3 model_main.py –alsologtostderr
WARNING: Logging before flag parsing goes to stderr.
W0718 18:36:15.213086 140712742450944 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0718 18:36:15.229685 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/slim/nets/inception_resnet_v2.py:373: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

W0718 18:36:15.235659 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

W0718 18:36:15.245254 140712742450944 deprecation_wrapper.py:119] From model_main.py:116: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --model_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --pipeline_config_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
W0718 18:36:15.245936 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/utils/config_util.py:96: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0718 18:36:15.248813 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/model_lib.py:597: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W0718 18:36:15.248920 140712742450944 model_lib.py:598] Forced number of epochs for all eval validations to be 1.
W0718 18:36:15.249017 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/utils/config_util.py:482: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

I0718 18:36:15.249068 140712742450944 config_util.py:482] Maybe overwriting use_bfloat16: False
I0718 18:36:15.249139 140712742450944 config_util.py:482] Maybe overwriting eval_num_epochs: 1
I0718 18:36:15.249200 140712742450944 config_util.py:482] Maybe overwriting sample_1_of_n_eval_examples: 1
I0718 18:36:15.249252 140712742450944 config_util.py:482] Maybe overwriting load_pretrained: True
I0718 18:36:15.249303 140712742450944 config_util.py:492] Ignoring config override key: load_pretrained
I0718 18:36:15.249358 140712742450944 config_util.py:482] Maybe overwriting train_steps: 200
W0718 18:36:15.249432 140712742450944 model_lib.py:614] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
I0718 18:36:15.249499 140712742450944 model_lib.py:649] create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0718 18:36:15.249855 140712742450944 estimator.py:209] Using config: {'_master': '', '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_experimental_distribute': None, '_device_fn': None, '_task_type': 'worker', '_is_chief': True, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_protocol': None, '_train_distribute': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_evaluation_master': '', '_save_checkpoints_steps': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_model_dir': './train_ssd_effecientnet', '_keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff9c6d16f60>, '_tf_random_seed': None, '_experimental_max_worker_delay_secs': None, '_eval_distribute': None, '_task_id': 0, '_num_worker_replicas': 1, '_num_ps_replicas': 0}
W0718 18:36:15.250040 140712742450944 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7ff9c6d36f28>) includes params argument, but params are not passed to Estimator.
I0718 18:36:15.250505 140712742450944 estimator_training.py:186] Not using Distribute Coordinator.
I0718 18:36:15.250636 140712742450944 training.py:612] Running training and evaluation locally (non-distributed).
I0718 18:36:15.250807 140712742450944 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
W0718 18:36:15.255248 140712742450944 deprecation.py:323] From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0718 18:36:15.262588 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/data_decoders/tf_example_decoder.py:170: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W0718 18:36:15.262764 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/data_decoders/tf_example_decoder.py:185: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

W0718 18:36:15.272913 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:61: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead.

W0718 18:36:15.273869 140712742450944 dataset_builder.py:66] num_readers has been reduced to 1 to match input file shards.
W0718 18:36:15.278006 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W0718 18:36:15.278145 140712742450944 deprecation.py:323] From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0718 18:36:15.296915 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:149: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0718 18:36:15.435589 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/utils/ops.py:472: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

W0718 18:36:15.436193 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/utils/ops.py:472: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0718 18:36:15.438794 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/utils/ops.py:474: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0718 18:36:15.472958 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/inputs.py:320: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0718 18:36:15.475012 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/core/preprocessor.py:512: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0718 18:36:15.516844 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/core/preprocessor.py:188: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0718 18:36:15.518922 140712742450944 deprecation.py:506] From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/util/dispatch.py:180: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
W0718 18:36:16.135120 140712742450944 deprecation_wrapper.py:119] From /home/mm/API-TF/models/research/object_detection/core/preprocessor.py:2421: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W0718 18:36:16.516294 140712742450944 deprecation.py:323] From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:152: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
I0718 18:36:16.525623 140712742450944 estimator.py:1145] Calling model_fn.
I0718 18:36:16.634358 140712742450944 efficientnet.py:635] global_params= GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, data_format='channels_last', num_classes=1000, width_coefficient=1.0, depth_coefficient=1.0, depth_divisor=8, min_depth=None, drop_connect_rate=0.2)
I0718 18:36:16.634685 140712742450944 efficientnet.py:636] blocks_args= [BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, expand_ratio=1, id_skip=True, strides=[1, 1], se_ratio=0.25), BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, expand_ratio=6, id_skip=True, strides=[1, 1], se_ratio=0.25), BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, expand_ratio=6, id_skip=True, strides=[1, 1], se_ratio=0.25)]
I0718 18:36:16.636757 140712742450944 efficientnet.py:128] round_filter input=32 output=32
I0718 18:36:16.636873 140712742450944 efficientnet.py:128] round_filter input=16 output=16
W0718 18:36:16.637104 140712742450944 deprecation.py:506] From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
I0718 18:36:16.640686 140712742450944 efficientnet.py:128] round_filter input=16 output=16
I0718 18:36:16.640795 140712742450944 efficientnet.py:128] round_filter input=24 output=24
I0718 18:36:16.647423 140712742450944 efficientnet.py:128] round_filter input=24 output=24
I0718 18:36:16.647552 140712742450944 efficientnet.py:128] round_filter input=40 output=40
I0718 18:36:16.654196 140712742450944 efficientnet.py:128] round_filter input=40 output=40
I0718 18:36:16.654305 140712742450944 efficientnet.py:128] round_filter input=80 output=80
I0718 18:36:16.663950 140712742450944 efficientnet.py:128] round_filter input=80 output=80
I0718 18:36:16.664065 140712742450944 efficientnet.py:128] round_filter input=112 output=112
I0718 18:36:16.674213 140712742450944 efficientnet.py:128] round_filter input=112 output=112
I0718 18:36:16.674326 140712742450944 efficientnet.py:128] round_filter input=192 output=192
I0718 18:36:16.687278 140712742450944 efficientnet.py:128] round_filter input=192 output=192
I0718 18:36:16.687392 140712742450944 efficientnet.py:128] round_filter input=320 output=320
I0718 18:36:16.690702 140712742450944 efficientnet.py:128] round_filter input=32 output=32
I0718 18:36:16.692492 140712742450944 efficientnet.py:128] round_filter input=1280 output=1280
I0718 18:36:16.735306 140712742450944 efficientnet.py:475] Built stem layers with output shape: (?, 150, 150, 32)
I0718 18:36:16.735529 140712742450944 efficientnet.py:490] block_0 drop_connect_rate: 0.0
I0718 18:36:16.735614 140712742450944 efficientnet.py:272] Block input: None/efficientnet-b0/stem/lambda/swish_f32:0 shape: (?, 150, 150, 32)
I0718 18:36:16.735670 140712742450944 efficientnet.py:277] Expand: None/efficientnet-b0/stem/lambda/swish_f32:0 shape: (?, 150, 150, 32)
I0718 18:36:16.768238 140712742450944 efficientnet.py:280] DWConv: None/efficientnet-b0/blocks_0/lambda_1/swish_f32:0 shape: (?, 150, 150, 32)
Traceback (most recent call last):
  File "model_main.py", line 116, in <module>
    tf.app.run()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/mm/.venv/lib/python3.5/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/mm/.venv/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main.py", line 112, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/home/mm/API-TF/models/research/object_detection/model_lib.py", line 288, in model_fn
    features[fields.InputDataFields.true_image_shape])
  File "/home/mm/API-TF/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 559, in predict
    feature_maps = self._feature_extractor(preprocessed_inputs)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
    self._maybe_build(inputs)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
    self.build(input_shapes)
  File "/home/mm/API-TF/models/research/object_detection/builders/efficientnet_feature_extractor.py", line 278, in build
    model = build_model_base_keras_model(input_shape[1:], self._network_name, self._is_training)
  File "/home/mm/API-TF/models/research/object_detection/builders/efficientnet.py", line 735, in build_model_base_keras_model
    net = model.call_model(inputs, training=training, features_only=True)
  File "/home/mm/API-TF/models/research/object_detection/builders/efficientnet.py", line 491, in call_model
    outputs = block.call(outputs, training=training, output_layer_name='block_%s'%idx)
  File "/home/mm/API-TF/models/research/object_detection/builders/efficientnet.py", line 284, in call
    x = tf.keras.layers.Lambda(self._call_se)(x)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/layers/core.py", line 785, in call
    return self.function(inputs, **arguments)
  File "/home/mm/API-TF/models/research/object_detection/builders/efficientnet.py", line 256, in _call_se
    se_tensor = self._se_expand(relu_fn(self._se_reduce(se_tensor)))
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
    self._maybe_build(inputs)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
    self.build(input_shapes)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/keras/layers/convolutional.py", line 192, in build
    self.rank + 2))
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1050, in __init__
    filter_shape[num_spatial_dims]))
ValueError: number of input channels does not match corresponding dimension of filter, 8 != 32
CasiaFan commented 5 years ago

See #6 I think it's an internal bug from tf 1.14 where keras Conv2D operation has a wired performance on se expand operation in se block. You can roll back to tf 1.13.1 to fix this bug temporarily.

PythonImageDeveloper commented 5 years ago

I roll back to tf 1.13.1, and now i got this error:

(.venv) mm@mm:~/API-TF2/models/research/object_detection$ python3 model_main.py –alsologtostderr

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --model_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --pipeline_config_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7fe4c9e3a6a8>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/utils/ops.py:472: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/inputs.py:320: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/core/preprocessor.py:188: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/core/preprocessor.py:1240: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
WARNING:tensorflow:From /home/mm/API-TF/models/research/object_detection/builders/dataset_builder.py:152: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
WARNING:tensorflow:From /home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/framework/function.py:1007: calling Graph.create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Shapes are always computed; don't use the compute_shapes as it has no effect.
Traceback (most recent call last):
  File "model_main.py", line 116, in <module>
    tf.app.run()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "model_main.py", line 112, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run
    return self.run_local()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local
    saving_listeners=saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/home/mm/API-TF/models/research/object_detection/model_lib.py", line 288, in model_fn
    features[fields.InputDataFields.true_image_shape])
  File "/home/mm/API-TF/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 578, in predict
    im_width=image_shape[2]))
  File "/home/mm/API-TF/models/research/object_detection/core/anchor_generator.py", line 100, in generate
    raise ValueError('Number of feature maps is expected to equal the length '
ValueError: Number of feature maps is expected to equal the length of `num_anchors_per_location`.
CasiaFan commented 5 years ago

@PythonImageDeveloper It's caused by grammar discrepancy between python2 and python3 in ssd_meta_arch.py. Change the returned feature_maps.values() in efficientnet_feature_extractor to list type explicitly: return list(feature_maps.values())

PythonImageDeveloper commented 5 years ago

@CasiaFan I modified this part of in efficientnet_feature_extractor to below, but I got the same error:

    def _extract_features(self, preprocessed_inputs):
        """Extract features from preprocessed inputs"""        
        preprocessed_inputs = shape_utils.check_min_image_dim(33, preprocessed_inputs)
        image_features = self.net(ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
        layouts = {self._used_nodes[i]: image_features[i] for i, x in enumerate(self._used_nodes) if x}
        feature_maps = self._feature_map_generator(layouts)
        if self._additional_layer_depth:
            final_feature_map = []
            for idx, feature in enumerate(feature_maps.values()):
                feature = l.Conv2D(filters=self._additional_layer_depth,
                                    kernel_size=1,
                                    strides=[1, 1],
                                    use_bias=True,
                                    data_format=self._data_format,
                                    name='conv1x1_'+str(idx))(feature)
                feature = l.BatchNormalization()(feature, training=self._is_training)
                feature = l.ReLU(max_value=6)(feature)
                final_feature_map.append(feature)
            return final_feature_map
        else:
            return list(feature_maps.values())
CasiaFan commented 5 years ago

@PythonImageDeveloper Do you make some change in the config file except for the training file path?

PythonImageDeveloper commented 5 years ago
# SSD with Mobilenet v2 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_efficientnet'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.01
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 8
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  #fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
  #fine_tune_checkpoint_type:  "detection"
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/my_training.record"
  }
  label_map_path: "data/my.pbtxt"
}

eval_config: {
  num_examples: 8000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/my_testing.record"
  }
  label_map_path: "data/my.pbtxt"
  shuffle: false
  num_readers: 1
}
CasiaFan commented 5 years ago

@PythonImageDeveloper Uha, here it is. According to your configuration, it will generate anchors for 6 layers which means there should also be 6 layers of feature maps to correspond with. But feature extractor will use 5 layers in default (min level 3 to max level 7). To fix this problem, you could either change the num_layers to 5 or define min_feature_level and max_feature_level in feature_extractor section like:

feature_extractor {
...
min_feature_level: 3
max_feature_level: 8
}
PythonImageDeveloper commented 5 years ago

@CasiaFan, Do you correctly run so far? I now follow your new midification in github but I now got this error:

(.venv) mm@mm:~/API-TF2/models/research/object_detection$ python3 model_main.py –alsologtostderr

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --model_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/home/mm/.venv/lib/python3.5/site-packages/absl/flags/_validators.py:358: UserWarning: Flag --pipeline_config_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
Traceback (most recent call last):
  File "model_main.py", line 116, in <module>
    tf.app.run()
  File "/home/mm/.venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "model_main.py", line 78, in main
    FLAGS.sample_1_of_n_eval_on_train_examples))
  File "/home/mm/API-TF/models/research/object_detection/model_lib.py", line 589, in create_estimator_and_inputs
    pipeline_config_path, config_override=config_override)
  File "/home/mm/API-TF/models/research/object_detection/utils/config_util.py", line 98, in get_configs_from_pipeline_file
    text_format.Merge(proto_str, pipeline_config)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 536, in Merge
    descriptor_pool=descriptor_pool)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 590, in MergeLines
    return parser.MergeLines(lines, message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 623, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 638, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
    merger(tokenizer, message, field)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
    merger(tokenizer, message, field)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
    merger(tokenizer, message, field)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 730, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 12:7 : Message type "object_detection.protos.SsdFeatureExtractor" has no field named "network_version".

When I comment the network_nersion in config file, I got this error:

 File "/home/mm/.venv/lib/python3.5/site-packages/google/protobuf/text_format.py", line 730, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 13:7 : Message type "object_detection.protos.SsdFeatureExtractor" has no field named "min_feature_level".
CasiaFan commented 5 years ago

@PythonImageDeveloper Repeat step 3.

PythonImageDeveloper commented 5 years ago

Hi @CasiaFan I again replace your ssd.proto with orginal ssd.proto, and I run protoc object_detection/protos/ssd.proto --python_out=. , I got same error again: google.protobuf.text_format.ParseError: 12:7 : Message type "object_detection.protos.SsdFeatureExtractor" has no field named "network_version".

my protoc --version is :libprotoc 3.5.1

dlawrences commented 4 years ago

Confirming it all works with the following changes: