EdjeElectronics / TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10

How to train a TensorFlow Object Detection Classifier for multiple object detection on Windows
Apache License 2.0
2.92k stars 1.3k forks source link

faster_rcnn_nas_coco can't set --num_clones #61

Open austinmw opened 6 years ago

austinmw commented 6 years ago

I've previously tried two models: ssd_mobilenet_v1_coco_2017_11_17 and faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 which will both run like this:

run_train() { export CUDA_VISIBLE_DEVICES=0,1,2 python3 /home//training/train.py --logtostderr --pipeline_config_path=/home//training/faster_rcnn_nas_coco.config --train_dir=/home//training/models/train --num_clones=3 --ps_tasks=1 unset CUDA_VISIBLE_DEVICES }

I have 4 GPU's so I've been setting the first three to train and the last one to eval. However, for some reason I'm unable to do the same for the model faster_rcnn_nas_coco_2018_01_28. I get the error:

WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. Traceback (most recent call last): File "/home/awelch/training/train.py", line 184, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/awelch/training/train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "/tensorflow/models/research/object_detection/trainer.py", line 285, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "/tensorflow/models/research/slim/deployment/model_deploy.py", line 193, in create_clones outputs = model_fn(*args, **kwargs) File "/tensorflow/models/research/object_detection/trainer.py", line 177, in _create_losses train_config.use_multiclass_scores) ValueError: not enough values to unpack (expected 7, got 0)

Could anyone please explain why this is or how I can fix it?

apacha commented 6 years ago

Maybe changing the sync_replicas or replicas_to_aggregate parameters in your configuration might change something. See https://stackoverflow.com/a/51724431/448357

kolligowtham commented 4 years ago

Hi , Struck with the same issue ..... update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone_scope) training stops at this line Did u find any solution ?? working fine for other models