EdjeElectronics / TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

How to train a TensorFlow Object Detection Classifier for multiple object detection on Windows
Apache License 2.0
2.92k stars 1.3k forks source link

error First step cannot be zero when running train.py #51

Open blockhunts opened 6 years ago

blockhunts commented 6 years ago

i tried to use the same images (card) provided, i just delete all the processed file (csv,dll) and follow all the step. And when i tried to issue python train.py I got this error

Traceback (most recent call last):
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\MRCPP-Fablab\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "E:\tensor\models\research\object_detection\trainer.py", line 288, in train
    train_config.optimizer)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 50, in build
    learning_rate = _create_learning_rate(config.learning_rate)
  File "E:\tensor\models\research\object_detection\builders\optimizer_builder.py", line 109, in _create_learning_rate
    learning_rate_sequence, config.warmup)
  File "E:\tensor\models\research\object_detection\utils\learning_schedules.py", line 156, in manual_stepping
    raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

Any clues why this happen?

Surasi-Jui commented 6 years ago

I have the same error.Do you find how to solve it?

blockhunts commented 6 years ago

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
Surasi-Jui commented 6 years ago

Thank you. It work here :)

On Mon, 4 Jun 2561 at 16:52 blockhunts notifications@github.com wrote:

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0002 schedule { step: 900000 learning_rate: .00002 } schedule { step: 1200000 learning_rate: .000002 } } } momentum_optimizer_value: 0.9 } use_moving_average: false }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/51#issuecomment-394298738, or mute the thread https://github.com/notifications/unsubscribe-auth/AmB6BPsiQVe1w3fV6oSj5jUW-ATlE7pEks5t5QNIgaJpZM4UPjDR .

leccyril commented 6 years ago

if you download model from the github repository files are up to date

jim-meyer commented 6 years ago

I ran into this same error while using the AWS DL AMI (Deep Learning AMI (Ubuntu) Version 10.0 (ami-23c4fb46)) and following, as far as I can tell, the same steps I used on Windows with obvious substitutions since this AMI is Ubuntu. Both Ubuntu and Windows are using TF 1.8. But when I use the train_config that blockhunts mentioned I get: Traceback (most recent call last): File "/ml/models/research/object_detection/train.py", line 184, in tf.app.run() File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/ml/models/research/object_detection/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/ml/models/research/object_detection/trainer.py", line 298, in train train_config.optimizer) File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 50, in build learning_rate = _create_learning_rate(config.learning_rate) File "/ml/models/research/object_detection/builders/optimizer_builder.py", line 109, in _create_learning_rate learning_rate_sequence, config.warmup) File "/ml/models/research/object_detection/utils/learning_schedules.py", line 169, in manual_stepping [0] * num_boundaries)) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2681, in where return gen_math_ops.select(condition=condition, x=x, y=y, name=name) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6699, in select "Select", condition=condition, t=x, e=y, name=name) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 528, in _apply_op_helper (input_name, err)) ValueError: Tried to convert 't' to a tensor and failed. Error: Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted [].

Any ideas?

jim-meyer commented 6 years ago

I see that epratheeban has the solution to my problem mentioned here https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/11:

It's easy. Go to the utils folder. Find the learning_schedules.py file. Go to the line 167. And replace the line 167 with below

rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries), list(range(num_boundaries)), [0] * num_boundaries))

aghapesar1374 commented 6 years ago

Hi @jim-meyer I make this change and the problem solved but now returned this error

WARNING:tensorflow:From C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-pack ages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:317: softmax _cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Traceback (most recent call last): File "train.py", line 184, in tf.app.run() File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\ python\platform\app.py", line 126, in run _sys.exit(main(argv)) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete ction-0.1-py3.5.egg\object_detection\trainer.py", line 288, in train train_config.optimizer) File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 50, in build learning_rate = _create_learning_rate(config.learning_rate) File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete ction-0.1-py3.5.egg\object_detection\builders\optimizer_builder.py", line 109, i n _create_learning_rate learning_rate_sequence, config.warmup) File "C:\Users\sadegh\Anaconda3\envs\tensorflow1\lib\site-packages\object_dete ction-0.1-py3.5.egg\object_detection\utils\learning_schedules.py", line 168, in manual_stepping list(num_boundaries), TypeError: 'int' object is not iterable

tamizharasank commented 6 years ago

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

leccyril commented 6 years ago

@tamizharasank what file ? this kind of error copy it in google you will find the fix easily

Adibhatt95 commented 6 years ago

@tamizharasank did you solve this error? I got the same error, any suggesstions?

Kkaranmore commented 5 years ago

After making changes in configure file in training folder I got this error:

(tensorflow1) C:\tensorflow1\models\research\object_detection>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version. Instructions for updating: Use object_detection/model_main.py. WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.create_global_step WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\predictors\heads\box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\core\losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\ops\gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " WARNING:tensorflow:From C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2236: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step Traceback (most recent call last): File "train.py", line 184, in tf.app.run() File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\util\deprecation.py", line 272, in new_func return func(*args, **kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\trainer.py", line 397, in train include_global_step=False)) File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\variables_helper.py", line 126, in get_variables_available_in_checkpoint ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path) File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 306, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern), status) File "C:\Users\kayka\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt: Not found: FindFirstFile failed for: C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28 : The system cannot find the path specified. ; No such process

jim-meyer commented 5 years ago

Looks like you probably did not follow all of the steps in 2a, "Download TensorFlow Object Detection API repository from GitHub" and/or 2b, "Download the Faster-RCNN-Inception-V2-COCO model from TensorFlow's model zoo". Try following those steps again exactly and that should fix your problem.

mohamedelsiesyibra commented 5 years ago

File "C:\tensorflow1\models\research\object_detection\utils\learning_schedules.py", line 160, in manual_stepping raise ValueError('First step cannot be zero.') ValueError: First step cannot be zero.

i edit the file and save it and when i train it again it's return to it's original value

bebop-boop commented 5 years ago

I'm getting below error while i was trying to run: python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py:125: main (from main) is deprecated and will be removed in a future version. Instructions for updating: Use object_detection/model_main.py. WARNING:tensorflow:From C:\Tensorflow\models\research\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.create_global_step WARNING:tensorflow:From C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.

Traceback (most recent call last): File "train.py", line 184, in tf.app.run() File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func return func(*args, *kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 280, in train train_config.prefetch_queue_capacity, data_augmentation_options) File "C:\Tensorflow\models\research\object_detection\legacy\trainer.py", line 59, in create_input_queue tensor_dict = create_tensor_dict_fn() File "train.py", line 121, in get_next dataset_builder.build(config)).get_next() File "C:\Tensorflow\models\research\object_detection\builders\dataset_builder.py", line 124, in build num_additional_channels=input_reader_config.num_additional_channels) File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 307, in init default_value=''), File "C:\Tensorflow\models\research\object_detection\data_decoders\tf_example_decoder.py", line 59, in init label_map_proto_file, use_display_name=False) File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 164, in get_label_map_dict label_map = load_labelmap(label_map_path) File "C:\Tensorflow\models\research\object_detection\utils\label_map_util.py", line 133, in load_labelmap label_map_string = fid.read() File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 125, in read self._preread_check() File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check compat.as_bytes(self.__name), 1024 512, status) File "C:\Users\Asus\Miniconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect. ; Unknown error

jim-meyer commented 5 years ago

@ShubhranshuMaurya that error seems to indicate that there is something wrong with C:\Tensorflow\workspace raining_demonnotations/label_map.pbtxt. Have you opened that file in a text editor to see if it looks right? That file file should look something like this: item { name: 'Class1' id: 1 display_name: 'Class1 Label Name' }

item { name: 'Class2' id: 2 display_name: 'Class2 Label Name' }

IIRC this file could also be a binary protobuf file in which case viewing it in a text editor won't tell you much. But if it appears to be binary perhaps you could try creating a text version with your training labels and see if that works.

bharath5673 commented 5 years ago

tessor flow custom training

ERROR:raise ValueError('First step cannot be zero.') ValueError: First step cannot be zero.

SOLUTION: object_detection\training\ .config

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0002 schedule { step: 900000 learning_rate: .00002 } schedule { step: 1200000 learning_rate: .000002 } } } momentum_optimizer_value: 0.9 } use_moving_average: false }

Arri commented 5 years ago

For me it worked with 'step: 1' for some reason there was 'step: 0'...

siddas27 commented 4 years ago

TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)

Did you find a solution?

dpbnasika commented 4 years ago

yes, edit this in your config file in ...\models\research\object_detection\training

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }

can you explain what is happening in learning rate?, what does the both step size signify in manual learning rate and also what is initial learning rate?

EMRYLMZ1 commented 2 years ago

python train.py --logtostderr -train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v2_quantized_300x300_coco.config

Current thread 0x00005734 (most recent call first): File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 84 in _preread_check File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 122 in read File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 168 in load_labelmap File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\label_map_util.py", line 201 in get_label_map_dict File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 93 in init File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\data_decoders\tf_example_decoder.py", line 460 in init File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\decoder_builder.py", line 63 in build File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py", line 209 in build File "train.py", line 123 in get_next File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 58 in create_input_queue File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\legacy\trainer.py", line 279 in train File "train.py", line 182 in main File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324 in new_func File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 258 in _run_main File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\absl\app.py", line 312 in run File "C:\Users\EMRE\anaconda3\envs\gpuemre\lib\site-packages\tensorflow_core\python\platform\app.py", line 40 in run File "train.py", line 186 in

help