davidsandberg / facenet

Face recognition using Tensorflow
MIT License
13.75k stars 4.81k forks source link

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) #69

Closed Feynman27 closed 7 years ago

Feynman27 commented 7 years ago

Hi,

I'm attempting to train a classifier on my own set of images. I'm doing this by running: python facenet_train_classifier.py --logs_base_dir ./logs/facenet/ --models_base_dir ./models/facenet/ --data_dir ./align/aligned_images/ --image_size 160 --model_def models.inception_resnet_v1 --weight_decay 2e-4 --optimizer RMSPROP --learning_rate -1 --max_nrof_epochs 80 --keep_probability 0.8 --random_crop --random_flip --learning_rate_schedule_file ../data/learning_rate_schedule_classifier_long.txt --center_loss_factor 2e-5 --gpu_memory_fraction 0.9

During the experiment, I get a bunch of warnings: W tensorflow/core/framework/op_kernel.cc:975] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) [[Node: batch_join = QueueDequeueUpTo[_class=["loc:@batch_join/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, batch_join/n)]]

Followed by a crash:

Traceback (most recent call last): File "facenet_train_classifier.py", line 324, in <module> main(parse_arguments(sys.argv[1:])) File "facenet_train_classifier.py", line 171, in main update_centers) File "facenet_train_classifier.py", line 212, in train err, _, _, step, reg_loss = sess.run([loss, train_op, update_centers, global_step, regularization_losses], feed_dict=feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 718, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 916, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 966, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 986, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) [[Node: batch_join = QueueDequeueUpTo[_class=["loc:@batch_join/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, batch_join/n)]]

Caused by op u'batch_join', defined at: File "facenet_train_classifier.py", line 324, in <module> main(parse_arguments(sys.argv[1:])) File "facenet_train_classifier.py", line 85, in main args.batch_size, args.max_nrof_epochs, args.random_crop, args.random_flip, args.nrof_preprocess_threads) File "/home/ubuntu/facenet/src/facenet.py", line 138, in read_and_augument_data allow_smaller_final_batch=True) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 770, in batch_join dequeued = queue.dequeue_up_to(batch_size, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 500, in dequeue_up_to self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1143, in _queue_dequeue_up_to timeout_ms=timeout_ms, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 750, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2238, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1130, in __init__ self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) [[Node: batch_join = QueueDequeueUpTo[_class=["loc:@batch_join/fifo_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, batch_join/n)]]

I'm currently perusing through the code to try to understand what may be going wrong. Any ideas?

Feynman27 commented 7 years ago

Reducing the batch size and batches per epoch seem to help.

davidsandberg commented 7 years ago

Closing this since it seems to be solved.

busrakb commented 7 years ago

I did it but i am still taking same error. so is there another option?

xpzouying commented 7 years ago

Make sure datasets is right. The datasets directory "--data_dir ./align/aligned_images/" is PNG generated by running src/align/align_dataset_mtcnn.py.

With intro docs, use following command first:

python src/align/align_dataset_mtcnn.py ~/datasets/casia/CASIA-maxpy-clean/ ~/datasets/casia/casia_maxpy_mtcnnpy_182 --image_size 182 --margin 44 --random_order
Ayiruss commented 7 years ago

Anyone of you know where I can download the dataset with all the images??

yatharthahuja commented 5 years ago

https://github.com/radykov/facial-recognition-video-facenet

Try cloning this, there a data set folder in it.

mhirna commented 5 years ago

In my case, it was because few of the images had width or height less than 160 pixels.