DrSleep / tensorflow-deeplab-resnet

DeepLab-ResNet rebuilt in TensorFlow
MIT License
1.25k stars 431 forks source link

result about run inference.py #163

Open sunbin1205 opened 6 years ago

sunbin1205 commented 6 years ago

Perform inference. py tests a picture, and the segmentation result shows no complete outline of the object, but a lot of color spots. How to solve it? Looking forward to your reply!! @DrSleep

sunbin1205 commented 6 years ago

According to the link provided by readme, only deeplab_resnet_init. CKPT file, not the pre-trained model file, is this the reason for the inaccuracy of inferernce?? Could you please provide the pre- training document? Thank you very much! @DrSleep

DrSleep commented 6 years ago

there is deeplab_resnet.ckpt provided; you need to download it and run inference with it

sunbin1205 commented 6 years ago

I am very glad to receive your reply. The last problem has been solved! But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py. step 62 loss = 1.486, (62.292 sec/step) step 63 loss = 1.632, (61.552 sec/step) 2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg step 64 loss = 1.602, (62.308 sec/step) step 65 loss = 1.555, (61.499 sec/step) step 66 loss = 1.723, (62.328 sec/step) Traceback (most recent call last): File "train.py", line 258, in main() File "train.py", line 251, in main lossvalue, = sess.run([reduced_loss, train_op], feed_dict=feed_dict) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at: File "train.py", line 258, in main() File "train.py", line 146, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/tensorflow-deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]] This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

DrSleep commented 6 years ago

Make sure that all the images from the training list are present.

Otherwise, you will be getting this error:

Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg

On 24 January 2018 at 13:40, sunbin1205 notifications@github.com wrote:

I am very glad to receive your reply. The last problem has been solved! But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py. step 62 loss = 1.486, (62.292 sec/step) step 63 loss = 1.632, (61.552 sec/step) 2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg step 64 loss = 1.602, (62.308 sec/step) step 65 loss = 1.555, (61.499 sec/step) step 66 loss = 1.723, (62.328 sec/step) Traceback (most recent call last): File "train.py", line 258, in main() File "train.py", line 251, in main lossvalue, = sess.run([reduced_loss, train_op], feed_dict=feed_dict) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"]]

Caused by op u'create_inputs/batch', defined at: File "train.py", line 258, in main() File "train.py", line 146, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/tensorflow- deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"]] This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/163#issuecomment-360007021, or mute the thread https://github.com/notifications/unsubscribe-auth/AHemmOv8_rTRp6LEAKMjqKzh1W95rcjlks5tNp8UgaJpZM4Ro-2a .

sunbin1205 commented 6 years ago

@DrSleep I'm so sorry to distrub you again.Run the (python fine_tun.py -not-restore-last) error.it seems that jpg and png can be load ,but what is the problem?I guess may be the reason is cuda??

2018-01-26 03:50:20.103520: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2018-01-26 03:50:20.127612: # E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: # CUDA_ERROR_NO_DEVICE 2018-01-26 03:50:20.127687: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: zbw-System-Product-Name 2018-01-26 03:50:20.127700: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zbw-System-Product-Name 2018-01-26 03:50:20.127740: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 387.26.0 2018-01-26 03:50:20.127773: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) """ 2018-01-26 03:50:20.127796: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.90.0 2018-01-26 03:50:20.127807: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration Restored model parameters from ./deeplab_resnet.ckpt Traceback (most recent call last): File "fine_tune.py", line 207, in main() File "fine_tune.py", line 196, in main lossvalue, images, labels, preds, summary, = sess.run([reduced_loss, image_batch, label_batch, pred, total_summary, optim]) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at: File "fine_tune.py", line 207, in main() File "fine_tune.py", line 125, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/building_segmentation/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Jingyao12 commented 6 years ago

Hi @DrSleep,

I also meet the problem when I run the inference.py. I retain the model using the following setup, because the original batch size is out of memory I changed it to 4 BATCH_SIZE = 4 DATA_DIRECTORY = '/PASCAL/SemanticImg' DATA_LIST_PATH = './dataset/train.txt' IGNORE_LABEL = 255 INPUT_SIZE = '321,321' LEARNING_RATE = 2.5e-4 MOMENTUM = 0.9 NUM_CLASSES = 21 NUM_STEPS = 20001 POWER = 0.9 RANDOM_SEED = 1234 RESTORE_FROM = './deeplab_resnet.ckpt' SAVE_NUM_IMAGES = 2 SAVE_PRED_EVERY = 1000 SNAPSHOT_DIR = './snapshots/' WEIGHT_DECAY = 0.0005

the train.txt is as follows:

/JPEGImages/2007_000032.jpg /SegmentationClass/2007_000032.png /JPEGImages/2007_000039.jpg /SegmentationClass/2007_000039.png /JPEGImages/2007_000063.jpg /SegmentationClass/2007_000063.png /JPEGImages/2007_000068.jpg /SegmentationClass/2007_000068.png /JPEGImages/2007_000121.jpg /SegmentationClass/2007_000121.png /JPEGImages/2007_000170.jpg /SegmentationClass/2007_000170.png /JPEGImages/2007_000241.jpg /SegmentationClass/2007_000241.png /JPEGImages/2007_000243.jpg /SegmentationClass/2007_000243.png /JPEGImages/2007_000250.jpg /SegmentationClass/2007_000250.png /JPEGImages/2007_000256.jpg /SegmentationClass/2007_000256.png /JPEGImages/2007_000333.jpg /SegmentationClass/2007_000333.png /JPEGImages/2007_000363.jpg /SegmentationClass/2007_000363.png /JPEGImages/2007_000364.jpg /SegmentationClass/2007_000364.png /JPEGImages/2007_000392.jpg /SegmentationClass/2007_000392.png /JPEGImages/2007_000480.jpg /SegmentationClass/2007_000480.png /JPEGImages/2007_000504.jpg /SegmentationClass/2007_000504.png /JPEGImages/2007_000515.jpg /SegmentationClass/2007_000515.png /JPEGImages/2007_000528.jpg /SegmentationClass/2007_000528.png /JPEGImages/2007_000549.jpg /SegmentationClass/2007_000549.png /JPEGImages/2007_000584.jpg /SegmentationClass/2007_000584.png /JPEGImages/2007_000645.jpg /SegmentationClass/2007_000645.png /JPEGImages/2007_000648.jpg /SegmentationClass/2007_000648.png /JPEGImages/2007_000713.jpg /SegmentationClass/2007_000713.png /JPEGImages/2007_000720.jpg /SegmentationClass/2007_000720.png /JPEGImages/2007_000733.jpg /SegmentationClass/2007_000733.png /JPEGImages/2007_000738.jpg /SegmentationClass/2007_000738.png /JPEGImages/2007_000768.jpg /SegmentationClass/2007_000768.png . . . The total number of training images is 1464 , the name of the images is from the VOC2012 trian list.

The logs is showing as follow: Restored model parameters from ./deeplab_resnet.ckpt The checkpoint has been created. step 0 loss = 1.268, (16.858 sec/step) step 1 loss = 3.971, (1.734 sec/step) step 2 loss = 1.308, (1.339 sec/step) step 3 loss = 2.991, (1.329 sec/step) step 4 loss = 1.252, (1.346 sec/step) step 5 loss = 1.344, (1.335 sec/step) step 6 loss = 8.126, (1.331 sec/step) step 7 loss = 4.652, (1.339 sec/step) step 8 loss = 5.097, (1.339 sec/step) step 9 loss = 1.318, (1.334 sec/step) step 10 loss = 1.769, (1.353 sec/step) . . . step 19990 loss = 1.191, (1.419 sec/step) step 19991 loss = 1.183, (1.425 sec/step) step 19992 loss = 1.197, (1.424 sec/step) step 19993 loss = 1.183, (1.422 sec/step) step 19994 loss = 1.184, (1.408 sec/step) step 19995 loss = 1.192, (1.416 sec/step) step 19996 loss = 1.183, (1.419 sec/step) step 19997 loss = 1.183, (1.414 sec/step) step 19998 loss = 1.183, (1.420 sec/step) step 19999 loss = 1.183, (1.437 sec/step) The checkpoint has been created. step 20000 loss = 1.183, (12.276 sec/step)

It looks normal right?

but when I run the inferency. py the results are all background,even though I use the training image. Also if I use the deeplab_resnet.ckpt for inference, the output has both background and the segmented image.

Do you have any suggestion about this error?

Thank you!

DrSleep commented 6 years ago

@sunbin1205

E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration

Something with the drivers I assume. Can't help with that one.

@minnieyao You are restoring from the already pre-trained model, hence the learning rate might be too high. Try to restore from the init model and check the progress in tensorboard, should be alright

RaphaelDuan commented 5 years ago

Hi, @minnieyao. I got the same problem. Did you solve it? I restored from the deeplab_resnet_init.ckpt