apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Unable to use pretrained model in windows server 2016 with GPU context #12079

Closed Ferbach closed 6 years ago

Ferbach commented 6 years ago

Hello,

I would like to use my vgg16_atrous model that I have trained using gluon-cv.

I'm on Windows Server 2016 with CUDA 8.0 and the GPU is a Tesla P40 (driver 385.08).

I tried to execute this code :

import mxnet as mx
from gluoncv import data, utils, model_zoo

net_name = 'ssd_300_vgg16_atrous_voc'
resume = './ssd_300_vgg16_atrous_voc_0100_0.8975.params'

net = model_zoo.get_model(net_name, pretrained_base=True, ctx=mx.gpu(0))
net.load_params(resume.strip(),ctx=mx.gpu(0))

Each time my python crashes but when I use cpu context it works fine.

So to debug the gpu part I used this script:

import mxnet as mx

print(mx.gpu(0))
print(mx.nd.array([1,2],ctx=mx.gpu(0)))

The answer was:

gpu(0) Traceback (most recent call last): File "test_gpu.py", line 5, in print(mx.nd.array([1,2],ctx=mx.gpu(0))) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\mxnet\ndarray\utils.py", line 146, in array return _array(source_array, ctx=ctx, dtype=dtype) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\mxnet\ndarray\ndarray.py", line 2338, in array arr = empty(source_array.shape, ctx, dtype) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\mxnet\ndarray\ndarray.py", line 3548, in empty return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype)) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\mxnet\ndarray\ndarray.py", line 139, in _new_alloc_handle ctypes.byref(hdl))) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\mxnet\base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [17:20:03] c:\jenkins\workspace\mxnet-tag\mxnet\src\storage./pooled_storage_manager.h:108: cudaMalloc failed: device kernel image is invalid

Looking in forums I found this solution :

Downgrade mxnet to 1.1.0

After downgraded mxnet retried my code to debug the gpu part and it worked fine as the answer was :

gpu(0) [1. 2.] <NDArray 2 @gpu(0)>

But then when I come back to my previous code I have this error:

Traceback (most recent call last): File "test_video.py", line 97, in main() File "test_video.py", line 65, in main net = model_zoo.get_model(net_name, pretrained_base=True, ctx=mx.gpu(0)) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\model_zoo.py", line 105, in get_model net = modelsname File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\ssd.py", line 287, in ssd_300_vgg16_atrous_voc pretrained_base=pretrained_base, kwargs) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\ssd.py", line 258, in get_ssd pretrained=pretrained_base, classes=classes, ctx=ctx, kwargs) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\ssd.py", line 121, in init self.features = features(pretrained=pretrained, ctx=ctx) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\vgg_atrous.py", line 204, in vgg16_atrous_300 return get_vgg_atrous_extractor(16, 300, kwargs) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\vgg_atrous.py", line 193, in get_vgg_atrous_extractor net = VGGAtrousExtractor(layers, filters, extras, kwargs) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\vgg_atrous.py", line 112, in init super(VGGAtrousExtractor, self).init(layers, filters, batch_norm, **kwargs) File "C:\Users\Administrateur.WIN-JNTSDGOVCTG\Miniconda3\envs\alan\lib\site-packages\gluoncv\model_zoo\ssd\vgg_atrous.py", line 64, in init self.init_scale = self.params.get_constant('init_scale', init_scale) AttributeError: 'ParameterDict' object has no attribute 'get_constant'

I looked it up on forums and the solution proposed is to update mxnet ...

What can I do to resolve this issue ?

Thanks in advance

Ferbach commented 6 years ago

Sorry for the inconvenience, I resolved using : pip install --pre mxnet-cu80 for upgrading mxnet