cloud-annotations / google-colab-training

A notebook for training an object detection model
5 stars 7 forks source link

Start training issue #7

Open Thomasbt opened 2 years ago

Thomasbt commented 2 years ago

When i run this command !python -m object_detection.model_main \ --pipeline_config_path=$DATA_PATH/pipeline.config \ --model_dir=$OUTPUT_PATH \ --num_train_steps=$NUM_TRAIN_STEPS \ --num_eval_steps=100

I get this error:

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W0331 07:05:09.392434 140537817827200 model_lib.py:801] Forced number of epochs for all eval validations to be 1. INFO:tensorflow:Maybe overwriting train_steps: 500 I0331 07:05:09.392771 140537817827200 config_util.py:552] Maybe overwriting train_steps: 500 INFO:tensorflow:Maybe overwriting use_bfloat16: False I0331 07:05:09.392954 140537817827200 config_util.py:552] Maybe overwriting use_bfloat16: False INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1 I0331 07:05:09.393109 140537817827200 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1 INFO:tensorflow:Maybe overwriting eval_num_epochs: 1 I0331 07:05:09.393283 140537817827200 config_util.py:552] Maybe overwriting eval_num_epochs: 1 WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. W0331 07:05:09.393471 140537817827200 model_lib.py:817] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu None I0331 07:05:09.393648 140537817827200 model_lib.py:852] create_estimator_and_inputs: use_tpu False, export_to_tpu None INFO:tensorflow:Using config: {'_model_dir': '/content/output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd10a9412d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} I0331 07:05:09.394231 140537817827200 estimator.py:212] Using config: {'_model_dir': '/content/output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd10a9412d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x7fd10a92f7a0>) includes params argument, but params are not passed to Estimator. W0331 07:05:09.394517 140537817827200 model_fn.py:630] Estimator's model_fn (<function create_model_fn..model_fn at 0x7fd10a92f7a0>) includes params argument, but params are not passed to Estimator. INFO:tensorflow:Not using Distribute Coordinator. I0331 07:05:09.395075 140537817827200 estimator_training.py:186] Not using Distribute Coordinator. INFO:tensorflow:Running training and evaluation locally (non-distributed). I0331 07:05:09.395325 140537817827200 training.py:612] Running training and evaluation locally (non-distributed). INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600. I0331 07:05:09.395752 140537817827200 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600. WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. W0331 07:05:09.401718 140537817827200 deprecation.py:323] From /tensorflow-1.15.2/python3.7/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Reading unweighted datasets: ['/content/data/train.record'] I0331 07:05:09.431689 140537817827200 dataset_builder.py:148] Reading unweighted datasets: ['/content/data/train.record'] INFO:tensorflow:Reading record datasets for input file: ['/content/data/train.record'] I0331 07:05:09.432735 140537817827200 dataset_builder.py:77] Reading record datasets for input file: ['/content/data/train.record'] INFO:tensorflow:Number of filenames to read: 1 I0331 07:05:09.432889 140537817827200 dataset_builder.py:78] Number of filenames to read: 1 WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W0331 07:05:09.433032 140537817827200 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic. W0331 07:05:09.439119 140537817827200 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic. WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.map() W0331 07:05:09.460939 140537817827200 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.map() WARNING:tensorflow:Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7fd10a8d9050>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: module 'gast' has no attribute 'Index' W0331 07:05:09.496219 140537817827200 ag_logging.py:146] Entity <bound method TfExampleDecoder.decode of <object_detection.data_decoders.tf_example_decoder.TfExampleDecoder object at 0x7fd10a8d9050>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: module 'gast' has no attribute 'Index' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/models/research/object_detection/model_main.py", line 108, in tf.app.run() File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main.py", line 104, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default input_fn, ModeKeys.TRAIN)) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn self._call_input_fn(input_fn, mode)) File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn return input_fn(*kwargs) File "/content/models/research/object_detection/inputs.py", line 711, in _train_input_fn params=params) File "/content/models/research/object_detection/inputs.py", line 851, in train_input reduce_to_frame_fn=reduce_to_frame_fn) File "/content/models/research/object_detection/builders/dataset_builder.py", line 237, in build input_reader_config) File "/content/models/research/object_detection/builders/dataset_builder.py", line 222, in dataset_map_fn fn_to_map, num_parallel_calls=num_parallel_calls) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(args, *kwargs) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 1950, in map_with_legacy_function use_legacy_function=True)) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in init use_legacy_function=use_legacy_function) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in init self._function.add_to_graph(ops.get_default_graph()) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 545, in add_to_graph self._create_definition_if_needed() File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 377, in _create_definition_if_needed self._create_definition_if_needed_impl() File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 408, in _create_definition_if_needed_impl capture_resource_var_by_value=self._capture_resource_var_by_value) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 944, in func_graph_from_py_func outputs = func(func_graph.inputs) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2681, in wrapper_fn ret = _wrapper_helper(args) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper ret = autograph.tf_convert(func, ag_ctx)(nested_args) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper raise e.ag_error_metadata.to_exception(e) NotImplementedError: in converted code:

/content/models/research/object_detection/data_decoders/tf_example_decoder.py:509 decode default_groundtruth_weights) /tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:507 new_func return func(*args, **kwargs) /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1235 cond orig_res_f, res_f = context_f.BuildCondBranch(false_fn) /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1061 BuildCondBranch original_result = fn() /content/models/research/object_detection/data_decoders/tf_example_decoder.py:502 default_groundtruth_weights dtype=tf.float32) /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2560 ones output = _constant_if_small(one, shape, dtype, name) /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2295 _constant_if_small if np.prod(shape) < 1000:

<__array_function__ internals>:6 prod /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3052 prod keepdims=keepdims, initial=initial, where=where) /usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:86 _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:736 __array__ " array.".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array.
lsgcv commented 2 years ago

Hi, Did you find a solution for that issue?

Thomasbt commented 2 years ago

Hi, no I have not