Closed phongnhhn92 closed 7 years ago
I think you are running this with multiple workers and GPU. It was not intended to run this with GPU. Have a look at here if you still want to use GPU. Valikund gave a solution to run this with multiple workers/GPU. The reason why you could run the object_detection_multiplayer.py is because it has only 1 worker.
@datitran Dear Datitran, thanks for your code, I learn a lot from them. But I have a question about about GPU use in this APP, my computer have a GPU GXT1060, WIN7,memory 8G , when I run this APP, it is auto run GPU, sometimes it will work well, but sometimes it show that there is not enough memory. then the APP crash, and computer not response.
Today I saw that you mention that you don't suggest use GPU for this APP, so my question is that
@l-chenyao well I said that I don't recommend to use it with GPU because my code is not optimized for it. If you want to use it with GPU, you need to do a couple of things. First of all, you need to limit the memory consumption of GPU (see https://github.com/datitran/object_detector_app/issues/4). TensorFlow usually reserves entire GPU memory even though it is not using everything. This is problematic if you have a couple of workers running at the same time. So that's why you have this random behavior because sometimes not all workers are taking every memory. Secondly, I'm using feed_dict
to read in the data. This is okay for CPU but not good for GPU. For GPU, it would be better to use queues but well this is more an optimization aspect.
Hi guys, I tried to run object_deteection_app.py but I has this error `phong@Storm:~/PycharmProjects/Object-Detector-App-master$ python object_detection_app.py [DEBUG/MainProcess] created semlock with handle 139891181490176 [DEBUG/MainProcess] created semlock with handle 139891181486080 [DEBUG/MainProcess] created semlock with handle 139891180916736 [DEBUG/MainProcess] Queue._after_fork() [DEBUG/MainProcess] created semlock with handle 139891180912640 [DEBUG/MainProcess] created semlock with handle 139891180908544 [DEBUG/MainProcess] created semlock with handle 139891180904448 [DEBUG/MainProcess] Queue._after_fork() [DEBUG/MainProcess] created semlock with handle 139891180900352 [DEBUG/MainProcess] created semlock with handle 139891180896256 [DEBUG/MainProcess] created semlock with handle 139891180892160 [DEBUG/MainProcess] created semlock with handle 139891180888064 [DEBUG/MainProcess] added worker [DEBUG/PoolWorker-2] Queue._after_fork() [DEBUG/PoolWorker-2] Queue._after_fork() [INFO/PoolWorker-2] child process calling self.run() [DEBUG/MainProcess] added worker [DEBUG/PoolWorker-3] Queue._after_fork() [DEBUG/PoolWorker-3] Queue._after_fork() [INFO/PoolWorker-3] child process calling self.run() [DEBUG/MainProcess] Queue._start_thread() [DEBUG/MainProcess] doing self._thread.start() [DEBUG/MainProcess] starting thread to feed data to pipe [DEBUG/MainProcess] ... done self._thread.start() 2017-06-28 11:10:45.146711: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.146755: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.146776: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.254015: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.254073: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.254082: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:45.300271: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-06-28 11:10:45.301197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.7085 pciBusID 0000:02:00.0 Total memory: 7.92GiB Free memory: 7.43GiB 2017-06-28 11:10:45.301219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-06-28 11:10:45.301227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-06-28 11:10:45.301238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:02:00.0) 2017-06-28 11:10:45.390544: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-06-28 11:10:45.390759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.7085 pciBusID 0000:02:00.0 Total memory: 7.92GiB Free memory: 302.94MiB 2017-06-28 11:10:45.390785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-06-28 11:10:45.390793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-06-28 11:10:45.390824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:02:00.0) 2017-06-28 11:10:46.907853: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2017-06-28 11:10:46.907894: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support [INFO/PoolWorker-2] process shutting down [DEBUG/PoolWorker-2] running all "atexit" finalizers with priority >= 0 [DEBUG/PoolWorker-2] running the remaining "atexit" finalizers Process PoolWorker-2: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, *self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 97, in worker initializer(initargs) File "object_detection_app.py", line 79, in worker output_q.put(detect_objects(frame, sess, detection_graph)) File "object_detection_app.py", line 49, in detect_objects feed_dict={image_tensor: image_np_expanded}) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) InternalError: Blas SGEMM launch failed : m=22500, n=64, k=32 [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_pointwise/weights/read)]] [[Node: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Minimum_31/_1185 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7609_Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Minimum_31", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/convolution', defined at: File "object_detection_app.py", line 107, in
pool = Pool(args.num_workers, worker, (input_q, output_q))
File "/usr/lib/python2.7/multiprocessing/init.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 159, in init
self._repopulate_pool()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
w.start()
File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 126, in init
code = process_obj._bootstrap()
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, *self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 97, in worker
initializer(initargs)
File "object_detection_app.py", line 71, in worker
tf.import_graph_def(od_graph_def, name='')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 311, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas SGEMM launch failed : m=22500, n=64, k=32 [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_pointwise/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_pointwise/weights/read)]] [[Node: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Minimum_31/_1185 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7609_Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Minimum_31", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
[INFO/PoolWorker-2] process exiting with exitcode 1 [DEBUG/MainProcess] cleaning up worker 0 [DEBUG/MainProcess] added worker [DEBUG/PoolWorker-4] Queue._after_fork() [DEBUG/PoolWorker-4] Queue._after_fork() [INFO/PoolWorker-4] child process calling self.run() 2017-06-28 11:10:49.761792: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:49.761841: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:49.761849: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-28 11:10:49.871478: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-06-28 11:10:49.871800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.7085 pciBusID 0000:02:00.0 Total memory: 7.92GiB Free memory: 7.26GiB 2017-06-28 11:10:49.871817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-06-28 11:10:49.871824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-06-28 11:10:49.871836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:02:00.0)`
I see my webcam work but there is no windows openning after that. One more weird thing is I can run object_detection_multiplayer.py with no problem. Please help me to fix it !