Open maxadda opened 5 years ago
@maxadda What's you tensorflow and protoc versions? you can modify the ops.name_scope(name) --> tf.name_scope(name) or ops.name_scope(str(name)) in the line that the error occurred.
@CasiaFan tensorflow:1.14 protoc:3.4.0 There is no ops. name_scope in my code
@maxadda It should come from the improper definition of name in some name scope. Could you provide a more detailed trace log?
@CasiaFan WARNING: Logging before flag parsing goes to stderr. W0718 10:05:20.938903 139882210969408 lazy_loader.py:50] The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:
- https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
- https://github.com/tensorflow/addons
- https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue.
W0718 10:05:20.959230 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/slim/nets/inception_resnet_v2.py:373: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.
W0718 10:05:20.982215 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/slim/nets/mobilenet/mobilenet.py:397: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
W0718 10:05:21.005773 139882210969408 deprecation_wrapper.py:119] From object_detection/model_main.py:111: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
W0718 10:05:21.006982 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/config_util.py:98: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
W0718 10:05:21.014207 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/model_lib.py:614: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.
W0718 10:05:21.014352 139882210969408 model_lib.py:615] Forced number of epochs for all eval validations to be 1. W0718 10:05:21.014511 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/config_util.py:484: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
I0718 10:05:21.014616 139882210969408 config_util.py:484] Maybe overwriting train_steps: 50000
I0718 10:05:21.014730 139882210969408 config_util.py:484] Maybe overwriting sample_1_of_n_eval_examples: 1
I0718 10:05:21.014849 139882210969408 config_util.py:484] Maybe overwriting use_bfloat16: False
I0718 10:05:21.014959 139882210969408 config_util.py:484] Maybe overwriting eval_num_epochs: 1
I0718 10:05:21.015063 139882210969408 config_util.py:484] Maybe overwriting load_pretrained: True
I0718 10:05:21.015163 139882210969408 config_util.py:494] Ignoring config override key: load_pretrained
W0718 10:05:21.015912 139882210969408 model_lib.py:631] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs
= 0. Overwriting num_epochs
to 1.
I0718 10:05:21.016054 139882210969408 model_lib.py:666] create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0718 10:05:21.016710 139882210969408 estimator.py:209] Using config: {'_model_dir': 'object_detection/ssd_efficient_model/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f389a6a92e8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
W0718 10:05:21.016997 139882210969408 model_fn.py:630] Estimator's model_fn (<function create_model_fn.
W0718 10:05:21.070057 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/data_decoders/tf_example_decoder.py:192: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.
W0718 10:05:21.128521 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:64: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead.
W0718 10:05:21.137465 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.experimental.parallel_interleave(...)
.
W0718 10:05:21.137592 139882210969408 deprecation.py:323] From /home/mm/anaconda3/envs/ssd_1/lib/python3.6/site-packages/tensorflow/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic
.
W0718 10:05:21.167593 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:155: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0718 10:05:21.376130 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/ops.py:485: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.
W0718 10:05:21.381103 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/utils/ops.py:487: add_dispatch_support.
W0718 10:05:21.526682 139882210969408 deprecation_wrapper.py:119] From /home/mm/ssd_efficientNet/models/research/object_detection/core/preprocessor.py:2515: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
W0718 10:05:22.005042 139882210969408 deprecation.py:323] From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True)
.
I0718 10:05:22.016522 139882210969408 estimator.py:1145] Calling model_fn.
None
None
Traceback (most recent call last):
File "object_detection/model_main.py", line 111, in
I've gotten this error! I followed your instructions except changing number of classes.
@CasiaFan Yes, I use this structure to train coco datasets.
@maxadda @aif2017 Emm.. Here are 2 errors. The first one that causes your problem is the override of feature extractor name due to self._name = name
assginment at line 77 and 245. Just delete these 2 lines could fix it.
As for another problem you may meet, ValueError: number of input channels does not match corresponding dimension of filter, 8 != 32
. I have found the cause of this problem. A wired kernel shape error occurs when tensorflow.python.keras.layers.convolutional
assign weights to the kernel at line 158 that cause this mismatch. The simplest way is to go back to tensorflow 1.13.1 which version I have tested without this error.
@CasiaFan In your opinion, Is't make sense and right way to train ssd_efficientnet for coco dataset with only single GPU 1080 ti? As I seen in the papers, They used 8 titan x for train their the model.
@PythonImageDeveloper Limit your batch size and input image size. But in this case, you may need to use group normalization instead of batch normalization to achieve a between training convergence performance and a much longer time to finish training. Be patient~
@CasiaFan What's different between group normalization and batch normalization? The group normalization is new layer? Is it always better performance than batch normalization?
@PythonImageDeveloper @CasiaFan According to this modification, if you run "efficientnet_fpn" is can train the coco dataset; if you run "efficientnet",it will get the following error:
WARNING:tensorflow:From /home/mm/ssd_efficientNet/models/research/object_detection/builders/dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True)
.
Traceback (most recent call last):
File "object_detection/model_main.py", line 111, in
@PythonImageDeveloper See this paper: https://arxiv.org/pdf/1803.08494.pdf. This method could provide a better training result on small batch training.
@maxadda Emmm... I cannot repeat this error. What's your tf version and do you make some changes in the config file?
@CasiaFan Except for replacing ssd_efficientnet_fpn with ssd_efficientnet tensorflow version: 1.13.1
@CasiaFan If I want to replace all of BN layers to GN, I should implement this layer by myself? Is it not implement by tensorflow core? And in your opinion, In all situation, Can the GN have great result rather than BN?
Just have a try first.
On Fri, Jul 19, 2019, 10:55 PM DeeeepNet notifications@github.com wrote:
@CasiaFan https://github.com/CasiaFan If I want to replace all of BN layers to GN, I should implement this layer? Don't implement this layer by tensorflow core? And in your opinion, In all situation, Can the GN have great result rather than BN?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CasiaFan/SSD_EfficientNet/issues/6?email_source=notifications&email_token=ACQ6CWESGACJ3PDBGSKEBDLQAHIWNA5CNFSM4IEAYXZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L324Y#issuecomment-513260915, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQ6CWHMIYBSCSIOWH7YINTQAHIWNANCNFSM4IEAYXZA .
thanks. I wish you a life with satisfaction at end!
@aif2017 Can You train the ssd_efficentnet correctly? Don't you have any error?
@aif2017 Can You train the ssd_efficentnet correctly? Don't you have any error?
yyyyyy-eeeeeee-ssssssss
@aif2017 Very good,Thanks, Please write your modifications and steps for correctly do the train network.
@aif2017 Very good,Thanks, Please write your modifications and steps for correctly do the train network.
i just followed the instruction in repo. tf 1.13.1 is important. do you know how can i load feature extractor weights to this model?
@aif2017
What's your protoc --version
?
In my opinion, You should be set fine_tune_checkpoint_type='classification'
in the config file.
protoc
standard tf api installation wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip unzip protobuf.zip
In my opinion, You should be set
fine_tune_checkpoint_type='classification'
in the config file.
you mean this?from_detection_checkpoint must be false? train_config { fine_tune_checkpoint: "model.ckpt", fine_tune_checkpoint_type: "classification" from_detection_checkpoint: false }
have you ever done this?
@aif2017 No, I didn't have any experience of this. If you correctly do this, please share your experiences.
Hello, thank you very much for your open source efficientNet-ssd structure. I configure it according to your instructions. All steps and compilation are fine. But the following mistakes occurred in the training:
W0716 21:07:45.322222 139932370933568 model_lib.py:634] Expected number of evaluation epochs is 1, but instead encountered.model_fn at 0x7f443b15ebf8>) includes params argument, but params are not
passed to Estimator.
I0716 21:07:45.323519 139932370933568 estimator_training.py:186] Not using Distribute Coordinator.
I0716 21:07:45.323671 139932370933568 training.py:612] Running training and evaluation locally (non-distributed).
I0716 21:07:45.323882 139932370933568 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
eval_on_train_input_config.num_epochs
= 0. Overwritingnum_e pochs
to 1. I0716 21:07:45.322298 139932370933568 model_lib.py:669] create_estimator_and_inputs: use_tpu False, export_to_tpu False I0716 21:07:45.322627 139932370933568 estimator.py:209] Using config: {'_model_dir': 'object_detection/ssd_efficient_model/training/', '_tf_random_seed': None, '_save_summary_steps': 100 , '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4 43b160390>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} W0716 21:07:45.322852 139932370933568 model_fn.py:630] Estimator's model_fn (<function create_model_fn.