apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.79k stars 6.79k forks source link

MXNetError: Projection is only supported for LSTM with CuDNN version later than 7.1.1 #15190

Open Marcovaldong opened 5 years ago

Marcovaldong commented 5 years ago

I built the mxnet from source code with cuda-8.0 and cudnn7.1.3. I try to use lstm layer with a projection layer as follows, then I got this error.

import mxnet as mx
from mxnet.gluon import nn, rnn
layer = rnn.LSTM(hidden_size=1024, input_size=320, layout='NTC', projection_size=320)
layer.initialize(ctx=mx.gpu(0))
input = mx.random.uniform(shape=(4, 150, 320), ctx=mx.gpu(0))
out = layer(input)

Then, I got this error:

MXNetError: [18:40:38] src/operator/./cudnn_rnn-inl.h:82: Check failed: !param_.projection_size.has_value() Projection is only supported for LSTM with CuDNN version later than 7.1.1.

Stack trace returned 10 entries:
[bt] (0) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x48) [0x7fe789cabef8]
[bt] (1) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x18) [0x7fe789cac9d8]
[bt] (2) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::op::CuDNNRNNOp<float>::CuDNNRNNOp(mxnet::op::RNNParam)+0x2cb) [0x7fe78ecb5ddb]
[bt] (3) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::Operator* mxnet::op::CreateOp<mshadow::gpu>(mxnet::op::RNNParam, int)+0x158) [0x7fe78ecb2ab8]
[bt] (4) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::op::RNNProp::CreateOperatorEx(mxnet::Context, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, std::vector<int, std::allocator<int> >*) const+0x30) [0x7fe78c987dd0]
[bt] (5) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::op::OpPropCreateLayerOp(nnvm::NodeAttrs const&, mxnet::Context, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> > const&, std::vector<int, std::allocator<int> > const&)+0x27f) [0x7fe78c75ec5f]
[bt] (6) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(std::_Function_handler<mxnet::OpStatePtr (nnvm::NodeAttrs const&, mxnet::Context, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> > const&, std::vector<int, std::allocator<int> > const&), mxnet::OpStatePtr (*)(nnvm::NodeAttrs const&, mxnet::Context, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> > const&, std::vector<int, std::allocator<int> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::Context&&, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> > const&, std::vector<int, std::allocator<int> > const&)+0x18) [0x7fe789cfa348]
[bt] (7) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::Imperative::InvokeOp(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode, mxnet::OpStatePtr)+0xc2a) [0x7fe78c7e9eca]
[bt] (8) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x625) [0x7fe78c7ea805]
[bt] (9) /disk2/dongsq/incubator-mxnet/lib/libmxnet.so(MXImperativeInvokeImpl(void*, int, void**, int*, void***, int, char const**, char const**)+0xe57) [0x7fe78ce21f97]

The cuda and cudnn is installed in my own path, because I have no root permission. I added their path in my environment library.

@szha

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Cuda, Feature

szha commented 5 years ago

@Marcovaldong this looks like a problem with the error message as the macros are set to check cudnn 7.2 instead. cc @stephenrawls

Marcovaldong commented 5 years ago

@szha So should I use cudnn after than 7.2 ?

szha commented 5 years ago

Yes, using cudnn 7.2+ should solve the problem.

Marcovaldong commented 5 years ago

@szha Thanks for your reply, I upgrade drivers, cuda and cudnn, then I can use lstmp successfully.

szha commented 5 years ago

Actually since the error message is inaccurate let's keep this issue open until the inaccuracy is addressed