apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.73k stars 6.81k forks source link

example/rcnn demo.py error while detecting object using gpu #5799

Closed sunilmallya-work closed 6 years ago

sunilmallya-work commented 7 years ago

code works while running on CPU

For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: ubuntu 14.04 ; on AWS p2.8xl

Compiler: gcc

Package used (Python/R/Scala/Julia): python

MXNet version: 0.9.5

Or if installed from source:

MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

ubuntu@ip-172-31-36-165:~/mxnet/example/rcnn$ python demo.py --prefix final --epoch 0 --image bike.jpg --gpu 1 [12:01:58] src/engine/engine.cc:36: MXNet start using engine: NaiveEngine [12:01:59] /home/ubuntu/mxnet/dmlc-core/include/dmlc/logging.h:300: [12:01:59] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7ff446af368c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x1d8) [0x7ff447715948] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x57) [0x7ff4477177d7] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(+0xee6609) [0x7ff447430609] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataS1S3+0x23) [0x7ff446b608c3] [bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x8c) [0x7ff44735ca5c] [bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6Engine8PushSyncESt8functionIFvNS_10RunContextEEENS_7ContextERKSt6vectorIPNS_6engine3VarESaIS9_EESD_NS_10FnPropertyEiPKc+0x124) [0x7ff446b62314] [bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x62c) [0x7ff44743943c] [bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(+0xf3f8e4) [0x7ff4474898e4] [bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x2cd) [0x7ff447330d0d]

Traceback (most recent call last): File "demo.py", line 142, in main() File "demo.py", line 137, in main predictor = get_net(symbol, args.prefix, args.epoch, ctx) File "demo.py", line 36, in get_net arg_params, aux_params = load_param(prefix, epoch, convert=True, ctx=ctx, process=True) File "/home/ubuntu/mxnet/example/rcnn/rcnn/utils/load_model.py", line 53, in load_param arg_params = convert_context(arg_params, ctx) File "/home/ubuntu/mxnet/example/rcnn/rcnn/utils/load_model.py", line 35, in convert_context new_params[k] = v.as_in_context(ctx) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/ndarray.py", line 871, in as_in_context return self.copyto(context) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/ndarray.py", line 820, in copyto return _internal._copyto(self, out=hret) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/_ctypes/ndarray.py", line 164, in generic_ndarray_function c_array(ctypes.c_char_p, [c_str(val) for val in vals]))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/base.py", line 78, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [12:01:59] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7ff446af368c] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x1d8) [0x7ff447715948] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x57) [0x7ff4477177d7] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(+0xee6609) [0x7ff447430609] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataS1S3+0x23) [0x7ff446b608c3] [bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x8c) [0x7ff44735ca5c] [bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6Engine8PushSyncESt8functionIFvNS_10RunContextEEENS_7ContextERKSt6vectorIPNS_6engine3VarESaIS9_EESD_NS_10FnPropertyEiPKc+0x124) [0x7ff446b62314] [bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x62c) [0x7ff44743943c] [bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(+0xf3f8e4) [0x7ff4474898e4] [bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x2cd) [0x7ff447330d0d]

[12:01:59] src/engine/naive_engine.cc:35: Engine shutdown

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

python demo.py --prefix final --epoch 0 --image bike.jpg --gpu 1

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. install latest mxnet ; download the models 2.run example: python demo.py --prefix final --epoch 0 --image bike.jpg --gpu 1

What have you tried to solve it?

matt32106 commented 7 years ago

do you get the same result with --gpu 0? 1st gpu is numbered 0

Godricly commented 7 years ago

Your GPU memory is not enough for the demo.

bhokaal2k commented 7 years ago

From your error log:

mxnet.base.MXNetError: [12:01:59] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

The GPU memory is not enough

santoshmo commented 7 years ago

I wasn't able to reproduce this error on a p2.xl. I doubt the image is the issue, but would you mind uploading it? I was able to get correct output:

class ---- [[x1, x2, y1, y2, confidence]] --------- bicycle --------- [[ 14.01706886 96.79859924 449.25 334.8694458 0.99867541]] results saved to bike_result.jpeg

szha commented 6 years ago

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!