When setting sparse_grad=True in mxnet.gluon.nn.Embedding() I get an error.
Error Message
The error is:
Check failed: is_valid: Embedding input contains data out of bound
Full traceback is below:
Traceback (most recent call last):
File "/home/ubuntu/workspace/src/models/train.py", line 73, in
main()
File "/home/ubuntu/workspace/src/models/train.py", line 63, in main
model.train(train_dataloader, val_dataloader, test_dataloader, ctx)
File "BaseModel.py", line 53, in train
epoch_loss = self.epoch()
File "/home/ubuntu/workspace/src/models/BaseModel.py", line 120, in epoch
return epoch_loss.asscalar()
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 2014, in asscalar
return self.asnumpy()[0]
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
ctypes.c_size_t(data.size)))
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:41:00] src/operator/tensor/indexing_op.cu:284: Check failed: is_valid: Embedding input contains data out of bound
Hi @mohammedkhalilia , please provide an end-to-end example. My guess is that one of your input elements is >= config.char_size. Can you double check all inputs are < config.char_size?
Description
When setting sparse_grad=True in mxnet.gluon.nn.Embedding() I get an error.
Error Message
The error is: Check failed: is_valid: Embedding input contains data out of bound
Full traceback is below:
Traceback (most recent call last): File "/home/ubuntu/workspace/src/models/train.py", line 73, in
main()
File "/home/ubuntu/workspace/src/models/train.py", line 63, in main
model.train(train_dataloader, val_dataloader, test_dataloader, ctx)
File "BaseModel.py", line 53, in train
epoch_loss = self.epoch()
File "/home/ubuntu/workspace/src/models/BaseModel.py", line 120, in epoch
return epoch_loss.asscalar()
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 2014, in asscalar
return self.asnumpy()[0]
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
ctypes.c_size_t(data.size)))
File "/env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/base.py", line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:41:00] src/operator/tensor/indexing_op.cu:284: Check failed: is_valid: Embedding input contains data out of bound
[bt] (0) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x4b04cb) [0x7fbdc45f84cb] [bt] (1) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(void mxnet::op::SparseEmbeddingDeterministicKernelLaunch<int, float, long>(mxnet::OpContext const&, mxnet::TBlob const&, mxnet::TBlob const&, mxnet::OpReqType, mxnet::NDArray const&)+0x246) [0x7fbdc8a613d6] [bt] (2) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(mxnet::op::SparseEmbeddingOpBackwardDeterministicRspImpl(mxnet::OpContext const&, mxnet::TBlob const&, mxnet::TBlob const&, mxnet::OpReqType, mxnet::NDArray const&)+0x1b4b) [0x7fbdc8ab434b] [bt] (3) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(void mxnet::op::SparseEmbeddingOpBackwardRspImpl(bool, mxnet::OpContext const&, mxnet::TBlob const&, mxnet::TBlob const&, mxnet::OpReqType, mxnet::NDArray const&)+0x2f4) [0x7fbdc8ab5084]
[bt] (4) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(void mxnet::op::EmbeddingOpBackwardEx(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::vector<mxnet::NDArray, std::allocator > const&)+0x6dc) [0x7fbdc8abaa1c]
[bt] (5) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::vector<mxnet::NDArray, std::allocator > const&)> const&, nnvm::Op const, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var, std::allocator<mxnet::engine::Var> > const&, std::vector<mxnet::engine::Var, std::allocator<mxnet::engine::Var> > const&, std::vector<mxnet::Resource, std::allocator > const&, std::vector<mxnet::NDArray , std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0x9f) [0x7fbdc67a2f2f]
[bt] (6) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x25b5459) [0x7fbdc66fd459]
[bt] (7) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x25c1ce1) [0x7fbdc6709ce1]
[bt] (8) /env/mx_1.5_gnlp_0.8/local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x25c51f0) [0x7fbdc670d1f0]
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
I can provide a simple end-to-end script if needed.
Environment