CasiaFan / SSD_EfficientNet

SSD using TensorFlow object detection API with EfficientNet backbone
63 stars 10 forks source link

ValueError: name for name_scope must be a string. #6

Open maxadda opened 5 years ago

maxadda commented 5 years ago

Hello, thank you very much for your open source efficientNet-ssd structure. I configure it according to your instructions. All steps and compilation are fine. But the following mistakes occurred in the training:

W0716 21:07:45.322222 139932370933568 model_lib.py:634] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_e pochs to 1. I0716 21:07:45.322298 139932370933568 model_lib.py:669] create_estimator_and_inputs: use_tpu False, export_to_tpu False I0716 21:07:45.322627 139932370933568 estimator.py:209] Using config: {'_model_dir': 'object_detection/ssd_efficient_model/training/', '_tf_random_seed': None, '_save_summary_steps': 100 , '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4 43b160390>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} W0716 21:07:45.322852 139932370933568 model_fn.py:630] Estimator's model_fn (<function create_model_fn..model_fn at 0x7f443b15ebf8>) includes params argument, but params are not passed to Estimator. I0716 21:07:45.323519 139932370933568 estimator_training.py:186] Not using Distribute Coordinator. I0716 21:07:45.323671 139932370933568 training.py:612] Running training and evaluation locally (non-distributed). I0716 21:07:45.323882 139932370933568 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

raise ValueError("name for name_scope must be a string.")
ValueError: name for name_scope must be a string.
PythonImageDeveloper commented 5 years ago

@maxadda What's you tensorflow and protoc versions? you can modify the ops.name_scope(name) --> tf.name_scope(name) or ops.name_scope(str(name)) in the line that the error occurred.

maxadda commented 5 years ago

@CasiaFan tensorflow:1.14 protoc:3.4.0 There is no ops. name_scope in my code

CasiaFan commented 5 years ago

@maxadda It should come from the improper definition of name in some name scope. Could you provide a more detailed trace log?

maxadda commented 5 years ago

@CasiaFan WARNING: Logging before flag parsing goes to stderr. W0718 10:05:20.938903 139882210969408 lazy_loader.py:50] The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

W0718 10:05:20.959230 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/slim/nets/inception_resnet_v2.py:373: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

W0718 10:05:20.982215 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

W0718 10:05:21.005773 139882210969408 deprecation_wrapper.py:119] From object_detection/model_main.py:111: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

W0718 10:05:21.006982 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/config_util.py:98: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0718 10:05:21.014207 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/model_lib.py:614: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W0718 10:05:21.014352 139882210969408 model_lib.py:615] Forced number of epochs for all eval validations to be 1. W0718 10:05:21.014511 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/config_util.py:484: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

I0718 10:05:21.014616 139882210969408 config_util.py:484] Maybe overwriting train_steps: 50000 I0718 10:05:21.014730 139882210969408 config_util.py:484] Maybe overwriting sample_1_of_n_eval_examples: 1 I0718 10:05:21.014849 139882210969408 config_util.py:484] Maybe overwriting use_bfloat16: False I0718 10:05:21.014959 139882210969408 config_util.py:484] Maybe overwriting eval_num_epochs: 1 I0718 10:05:21.015063 139882210969408 config_util.py:484] Maybe overwriting load_pretrained: True I0718 10:05:21.015163 139882210969408 config_util.py:494] Ignoring config override key: load_pretrained W0718 10:05:21.015912 139882210969408 model_lib.py:631] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. I0718 10:05:21.016054 139882210969408 model_lib.py:666] create_estimator_and_inputs: use_tpu False, export_to_tpu False I0718 10:05:21.016710 139882210969408 estimator.py:209] Using config: {'_model_dir': 'object_detection/ssd_efficient_model/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f389a6a92e8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} W0718 10:05:21.016997 139882210969408 model_fn.py:630] Estimator's model_fn (<function create_model_fn..model_fn at 0x7f389a6b4378>) includes params argument, but params are not passed to Estimator. I0718 10:05:21.017848 139882210969408 estimator_training.py:186] Not using Distribute Coordinator. I0718 10:05:21.018114 139882210969408 training.py:612] Running training and evaluation locally (non-distributed). I0718 10:05:21.018487 139882210969408 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600. W0718 10:05:21.031924 139882210969408 deprecation.py:323] From /home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. W0718 10:05:21.069810 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/data_decoders/tf_example_decoder.py:177: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W0718 10:05:21.070057 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/data_decoders/tf_example_decoder.py:192: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

W0718 10:05:21.128521 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:64: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead.

W0718 10:05:21.137465 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.parallel_interleave(...). W0718 10:05:21.137592 139882210969408 deprecation.py:323] From /home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic. W0718 10:05:21.167593 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:155: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.map() W0718 10:05:21.376130 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/ops.py:485: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

W0718 10:05:21.381103 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/ops.py:487: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where W0718 10:05:21.439073 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/core/preprocessor.py:512: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0718 10:05:21.526682 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/core/preprocessor.py:2515: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W0718 10:05:22.005042 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). I0718 10:05:22.016522 139882210969408 estimator.py:1145] Calling model_fn. None None Traceback (most recent call last): File "object_detection/model_main.py", line 111, in tf.app.run() File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "object_detection/model_main.py", line 107, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/mm/ssd_efficientNet/models/research/object_detection/model_lib.py", line 302, in model_fn features[fields.InputDataFields.true_image_shape]) File "/home/mm/ssd_efficientNet/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 560, in predict feature_maps = self._feature_extractor(preprocessed_inputs) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 589, in call with graph.as_default(), backend.name_scope(self._name_scope()): File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 736, in name_scope return ops.name_scope_v2(name) File "/home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6567, in init raise ValueError("name for name_scope must be a string.") ValueError: name for name_scope must be a string.

aif2017 commented 5 years ago

I've gotten this error! I followed your instructions except changing number of classes.

maxadda commented 5 years ago

@CasiaFan Yes, I use this structure to train coco datasets.

CasiaFan commented 5 years ago

@maxadda @aif2017 Emm.. Here are 2 errors. The first one that causes your problem is the override of feature extractor name due to self._name = name assginment at line 77 and 245. Just delete these 2 lines could fix it.

As for another problem you may meet, ValueError: number of input channels does not match corresponding dimension of filter, 8 != 32. I have found the cause of this problem. A wired kernel shape error occurs when tensorflow.python.keras.layers.convolutional assign weights to the kernel at line 158 that cause this mismatch. The simplest way is to go back to tensorflow 1.13.1 which version I have tested without this error.

PythonImageDeveloper commented 5 years ago

@CasiaFan In your opinion, Is't make sense and right way to train ssd_efficientnet for coco dataset with only single GPU 1080 ti? As I seen in the papers, They used 8 titan x for train their the model.

CasiaFan commented 5 years ago

@PythonImageDeveloper Limit your batch size and input image size. But in this case, you may need to use group normalization instead of batch normalization to achieve a between training convergence performance and a much longer time to finish training. Be patient~

PythonImageDeveloper commented 5 years ago

@CasiaFan What's different between group normalization and batch normalization? The group normalization is new layer? Is it always better performance than batch normalization?

maxadda commented 5 years ago

@PythonImageDeveloper @CasiaFan According to this modification, if you run "efficientnet_fpn" is can train the coco dataset; if you run "efficientnet",it will get the following error:

WARNING:tensorflow:From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). Traceback (most recent call last): File "object_detection/model_main.py", line 111, in tf.app.run() File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "object_detection/model_main.py", line 107, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate return executor.run() File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run return self.run_local() File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local saving_listeners=saving_listeners) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/mm/ssd_efficientNet/models/research/object_detection/model_lib.py", line 302, in model_fn features[fields.InputDataFields.true_image_shape]) File "/home/mm/ssd_efficientNet/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 581, in predict predictor_results_dict = self._box_predictor(feature_maps) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 540, in call self._maybe_build(inputs) File "/home/mm/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1605, in _maybe_build self.build(input_shapes) File "/home/mm/ssd_efficientNet/models/research/object_detection/predictors/convolutional_keras_box_predictor.py", line 376, in build for input_shape in input_shapes TypeError: 'NoneType' object is not iterable

CasiaFan commented 5 years ago

@PythonImageDeveloper See this paper: https://arxiv.org/pdf/1803.08494.pdf. This method could provide a better training result on small batch training.

CasiaFan commented 5 years ago

@maxadda Emmm... I cannot repeat this error. What's your tf version and do you make some changes in the config file?

maxadda commented 5 years ago

@CasiaFan Except for replacing ssd_efficientnet_fpn with ssd_efficientnet tensorflow version: 1.13.1

PythonImageDeveloper commented 5 years ago

@CasiaFan If I want to replace all of BN layers to GN, I should implement this layer by myself? Is it not implement by tensorflow core? And in your opinion, In all situation, Can the GN have great result rather than BN?

CasiaFan commented 5 years ago

Just have a try first.

On Fri, Jul 19, 2019, 10:55 PM DeeeepNet notifications@github.com wrote:

@CasiaFan https://github.com/CasiaFan If I want to replace all of BN layers to GN, I should implement this layer? Don't implement this layer by tensorflow core? And in your opinion, In all situation, Can the GN have great result rather than BN?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CasiaFan/SSD_EfficientNet/issues/6?email_source=notifications&email_token=ACQ6CWESGACJ3PDBGSKEBDLQAHIWNA5CNFSM4IEAYXZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L324Y#issuecomment-513260915, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQ6CWHMIYBSCSIOWH7YINTQAHIWNANCNFSM4IEAYXZA .

aif2017 commented 5 years ago

thanks. I wish you a life with satisfaction at end!

PythonImageDeveloper commented 5 years ago

@aif2017 Can You train the ssd_efficentnet correctly? Don't you have any error?

aif2017 commented 5 years ago

@aif2017 Can You train the ssd_efficentnet correctly? Don't you have any error?

yyyyyy-eeeeeee-ssssssss

PythonImageDeveloper commented 5 years ago

@aif2017 Very good,Thanks, Please write your modifications and steps for correctly do the train network.

aif2017 commented 5 years ago

@aif2017 Very good,Thanks, Please write your modifications and steps for correctly do the train network.

i just followed the instruction in repo. tf 1.13.1 is important. do you know how can i load feature extractor weights to this model?

PythonImageDeveloper commented 5 years ago

@aif2017 What's your protoc --version? In my opinion, You should be set fine_tune_checkpoint_type='classification' in the config file.

aif2017 commented 5 years ago

protoc

standard tf api installation wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip unzip protobuf.zip

aif2017 commented 5 years ago

In my opinion, You should be set fine_tune_checkpoint_type='classification' in the config file.

you mean this?from_detection_checkpoint must be false? train_config { fine_tune_checkpoint: "model.ckpt", fine_tune_checkpoint_type: "classification" from_detection_checkpoint: false }

have you ever done this?

PythonImageDeveloper commented 5 years ago

@aif2017 No, I didn't have any experience of this. If you correctly do this, please share your experiences.