Why using GPU become less efficient?

xiaoxinyi commented 7 years ago

Test time for detect_face:

start_time = time.time()
# run detector
results = detector.detect_face(img)
end_time = time.time()
diff = end_time - start_time
print datetime.timedelta(seconds=diff)

ctx=mx.gpu(0), num_work=1 takes 1.807s
ctx=mx.gpu(0), num_work=4 takes 2.307s
ctx=mx.cpu(0), num_work=4 takes0.601s

ps : GTX1070

Li1991 commented 7 years ago

When I use GPU, I encounter this problem:

[11:16:43] /home/njfh/alex/mxnet/dmlc-core/include/dmlc/logging.h:300: [11:16:43] src/c_api/c_api_ndarray.cc:390: Operator _zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f34bf503a5c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x66d) [0x7f34bfd9811d] [bt] (2) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f34c2976adc] [bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f34c297640c] [bt] (4) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f34c2b8d5fe] [bt] (5) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f34c2b8ef9e] [bt] (6) python(PyEval_EvalFrameEx+0x98d) [0x5244dd] [bt] (7) python(PyEval_EvalCodeEx+0x2b1) [0x555551] [bt] (8) python(PyEval_EvalFrameEx+0x1a10) [0x525560] [bt] (9) python(PyEval_EvalCodeEx+0x2b1) [0x555551]

Can you tell me how to solve this problem? Thank you very much! @xiaoxinyi

xiaoxinyi commented 7 years ago

@Li1991 Try mxnet==0.9.2.

Li1991 commented 7 years ago

Thank you for answering me! I have found the version of 0.9.2 for a day,but can not find it. Would you please send it to me ?(891935370@qq.com) Thank you very much! @xiaoxinyi

xiaoxinyi commented 7 years ago

@Li1991 Alternatives:

git clone --recursive https://github.com/dmlc/mxnet.git

git checkout -b v0.9.2

Li1991 commented 7 years ago

Thank you very much @xiaoxinyi

YYuanAnyVision commented 7 years ago

@xiaoxinyi mxnet spends more time on GPU memory allocation at the first time, it should be ok when the initialization is done. you can detect the same image twice, it should be faster the second time.

Li1991 commented 7 years ago

The paper author said in the paper that he can get the speed of 99fps. But I can only get 17fps using GTX 1080. Do you know why, and how to solve this problem? Thank you ! @xiaoxinyi @pangyupo

xiaoxinyi commented 7 years ago

I can't even get 17fps both in Mxnet and tensorflow.

Li1991 commented 7 years ago

Have you tried this method in caffe and test the speed? @xiaoxinyi

xiaoxinyi commented 7 years ago

@Li1991 No.

flankechen commented 6 years ago

same issue, I guess time counting is not right here, and GPU needs time for data dumping.

tcye commented 6 years ago

the same problem, and i can't event get 16 fps on cpu @3.1G

hujuan940506 commented 6 years ago

@xiaoxinyi @Li1991 @tcye How big are your pictures？

lynnw123 commented 5 years ago

since minL = org_L(12/minisize)factor^(n), the size of your input image, the minisize and factor you set will affect the total number of calculations and thus the speed

chen849157649 commented 5 years ago

Test time for detect_face:
start_time = time.time()
# run detector
results = detector.detect_face(img)
end_time = time.time()
diff = end_time - start_time
print datetime.timedelta(seconds=diff)
ctx=mx.gpu(0), num_work=1 takes 1.807s

ctx=mx.gpu(0), num_work=4 takes 2.307s

ctx=mx.cpu(0), num_work=4 takes0.601s

ps : GT

Test time for detect_face:
start_time = time.time()
# run detector
results = detector.detect_face(img)
end_time = time.time()
diff = end_time - start_time
print datetime.timedelta(seconds=diff)
ctx=mx.gpu(0), num_work=1 takes 1.807s

ctx=mx.gpu(0), num_work=4 takes 2.307s

ctx=mx.cpu(0), num_work=4 takes0.601s

ps : GTX1070

hi , Are you sure to use the GPU? The detection process uses the numpy data format, and does not use nd.ndarray. If I want to detect multiple images at the same time, detect it on two gpus. How to use the GPU? thank you very much!

YYuanAnyVision / mxnet_mtcnn_face_detection

Why using GPU become less efficient? #12