Closed Soonhwan-Kwon closed 5 years ago
Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Build
@Soonhwan-Kwon Could you provide a simple reproducer?
Also, have you ever tried sym = sym.get_backend_symbol('MKLDNN_FC')
and used quantized_dtype='uint8'
when call quantize_model
?
https://github.com/apache/incubator-mxnet/blob/master/example/quantization/imagenet_gen_qsym_mkldnn.py#L183
@TaoLv We tried your suggestion before,
$ echo $MXNET_SUBGRAPH_BACKEND
MKLDNN
sym = sym.get_backend_symbol('MKLDNN')
sym = sym.get_backend_symbol('MKLDNN_FC')
and it produces error like below MXNetError Traceback (most recent call last)
@TaoLv And also quantized_dtype='uint8' produces the same original error message
MXNetError: Error in operator quantized_fusedrnn_t134_i2h: [11:40:16] src/operator/quantization/quantized_fully_connected.cc:41: Check failed: !shape_is_none(in_shape->at(0)) QuantizedFullyConnectedOp input data shape must be given
May I know which version of MXNet you are using? MKL-DNN QFC is merged into master recently. PR here: https://github.com/apache/incubator-mxnet/pull/14128
@TaoLv we tried version of 1.4.0.post0 which was the version before the commit, we'll try the latest version as you mentioned right now, thank you.
@Soonhwan-Kwon Thanks to reporting the issue.
Hi. I have tried the same problem. using "mxnet-cu90mkl 1.5.0b20190314"
First, I converted and saved a trained fused-rnn model.
import argparse
import os
import logging
import mxnet as mx
import gluoncv
from mxnet import gluon, nd, image
from gluoncv import utils
from gluoncv.model_zoo import get_model
from mxnet.contrib.quantization import *
from mxnet.base import SymbolHandle, check_call, _LIB, mx_uint, c_str_array
import ctypes
def save_symbol(fname, sym, logger=None):
if logger is not None:
logger.info('Saving symbol into file at %s' % fname)
sym.save(fname)
def save_params(fname, arg_params, aux_params, logger=None):
if logger is not None:
logger.info('Saving params into file at %s' % fname)
save_dict = {('arg:%s' % k): v.as_in_context(cpu()) for k, v in arg_params.items()}
save_dict.update({('aux:%s' % k): v.as_in_context(cpu()) for k, v in aux_params.items()})
mx.nd.save(fname, save_dict)
logging.basicConfig()
logger = logging.getLogger('logger')
logger.setLevel(logging.INFO)
prefix = 'fused_rnn'
dir_path = './checkpoints/'
prefix = os.path.join(dir_path, prefix)
epoch = 173
batch_size = 900
ctx = mx.cpu(0)
# load and convert
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
sym = sym.get_backend_symbol('MKLDNN')
sym = sym.get_backend_symbol('MKLDNN_FC')
excluded_sym_names = []
excluded_sym_names += ['conv0']
logger.info('Quantizing FP32 model %s' % prefix)
qsym, qarg_params, aux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params, excluded_sym_names=excluded_sym_names,
ctx=ctx, calib_mode='none', quantized_dtype='uint8', logger=logger)
qsym = qsym.get_backend_symbol('MKLDNN_POST_QUANTIZE')
qsym = qsym.get_backend_symbol('MKLDNN_POST_FC_QUANTIZE')
sym_name = '%s-symbol.json' % (prefix + '-quantized')
param_name = '%s-%04d.params' % (prefix + '-quantized', epoch)
save_symbol(sym_name, qsym, logger)
save_params(param_name, qarg_params, aux_params, logger)
And, I loaded the converted symbols and the params file.
import numpy as np
import mxnet as mx
import os
q_prefix = 'fused_rnn-quantized'
dir_path = './checkpints/'
q_prefix = os.path.join(dir_path, q_prefix)
epoch = 173
batch_size = 900
contexts = [mx.context.Context('cpu')]
q_symbol_file = q_prefix + '-symbol.json'
q_symbol = mx.sym.load(q_symbol_file)
q_symbol.simple_bind(ctx=mx.cpu(), data=(900, 137, 9), category=(900, 2))
When tried simple_bind, it leads to the simple_bind error like below,
RuntimeError: simple_bind error. Arguments: category: (900, 2) data: (900, 137, 9) [20:25:39] src/executor/../common/exec_utils.h:392: InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs:
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x421cd2) [0x7ff6c90dfcd2]
[bt] (1) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4222b8) [0x7ff6c90e02b8]
[bt] (2) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x31a10f1) [0x7ff6cbe5f0f1]
[bt] (3) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::string, mxnet::Context, std::less
@mxnet-label-bot add [MKLDNN, Quantization]
@Soonhwan-Kwon is there any 0 dimension in the shape of input data? quantized Fullyconnected requires all the dimension of input data are given.
@Soonhwan-Kwon @Amagong could you provide a mini reproducible case so that we can help to resolve the issue?
Maybe you also need to patch https://github.com/apache/incubator-mxnet/pull/14466 after excluding the 0-dim layers.
Check failed: !shape_is_none(in_shape->at(0))
The PR #14466 is merged. Please sync up the latest MXNet and build again.
@pengzhao-intel Thank you for the update. I'm rebuilding the MXNet now and @Amagong and I are working on the same project. @ciyongch we excluded embedding layer(which seems has 0 dimension) but has no effect.
@Soonhwan-Kwon Are you still facing the error of "Check failed: !shape_is_none(in_shape->at(0)) QuantizedFullyConnectedOp input data shape must be given" or "InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs:" ? Can you provide a reproducer, then we can take a look :)
@ciyongch Thank you for your quick response. I'll check with the newly built version. And, I'll prepare a simple reproducer.
There is currently an "InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs:" error.
@Amagong Still the same problem as original one. There's some layer has 0 dimension shape input, which is currently not supported by quantized FullyConnected operator. Please check your current model, and exclude all of this layer, I guess these layers are all comes from first time step. We're going to enhance the error message to help understand which operator is reporting this error.
can you try
q_symbol.infer_shape_partial(data=(900, 137, 9), category=(900, 2))
First list should be correspsonding to q_symbol.list_arguments(), Second list should be corresponding to q_symbol.list_outputs(), third should be q_symbol.list_auxiliary_states(). This should indicate which shape is missing.
@ciyongch @anirudh2290 Thank you for your reply. @anirudh2290 'infer_shape_partial' works well without error, but still could not bind.
The error above was due to the use of fused-rnn. Below is the simple reproduce code.
import math
import mxnet as mx
from mxnet.contrib.quantization import *
channel_num = 10
conv_layer_filter_dims = [2, 3]
conv_layer_strides = [1,1]
dimension = 5
data_len = 10
data = mx.sym.Variable('data')
label = mx.sym.Variable('label')
# layer stacking
net = mx.sym.Reshape(data=data, shape=(-4, -1, 1, 0, 0))
net = mx.sym.Convolution(data=net,
num_filter=channel_num,
kernel=tuple(conv_layer_filter_dims),
stride=tuple(conv_layer_strides),
weight=None,
bias=None,
no_bias=True,
cudnn_tune="fastest",
name="conv0")
net = mx.sym.BatchNorm(data=net,
eps=0.001,
momentum=0.9,
fix_gamma=False,
use_global_stats=False,
output_mean_var=False,
name="conv0_batchnorm"
)
data_lengths_references = int(math.floor((data_len - conv_layer_filter_dims[0]) / conv_layer_strides[0])) + 1
net = mx.sym.transpose(data=net, axes=(2, 0, 1, 3))
net = mx.sym.Reshape(data=net, shape=(0, 0, -3))
# Fused rnn :
stack = mx.rnn.FusedRNNCell(1024, num_layers=2, mode='rnn_relu', prefix='%s_l0' % ('gru'), bidirectional=False).unfuse()
# lstm :
'''
stack = mx.rnn.SequentialRNNCell()
cell = mx.rnn.LSTMCell(num_hidden=1760, prefix='%s_l0l0_' % ('gru'))
stack.add(cell)
'''
# gru :
'''
stack = mx.rnn.SequentialRNNCell()
cell = mx.rnn.GRUCell(num_hidden=1760, prefix='%s_l0l0_' % ('gru'))
stack.add(cell)
'''
net, _ = stack.unroll(length=data_lengths_references,
inputs=net,
merge_outputs=False,
layout='TNC'
)
net = net[data_lengths_references-1]
net = mx.sym.FullyConnected(data=net, num_hidden=10, no_bias=False, name="classification_fc_layer")
net = mx.sym.SoftmaxOutput(data=net, label=label)
mod = net.simple_bind(ctx=mx.cpu(0), data=(75, data_len, dimension))
# convert to quantize model
net = net.get_backend_symbol('MKLDNN')
net = net.get_backend_symbol('MKLDNN_FC')
excluded_sym_names = []
excluded_sym_names += ['conv0']
arg_dict = mod.arg_dict
aux_dict = mod.aux_dict
arg_params = {}
aux_params = {}
for k, v in arg_dict.items():
arg_params[k] = v
for k, v in aux_dict.items():
aux_params[k] = v
qnet, qarg_params, qaux_params = quantize_model(sym=net, arg_params=arg_params, aux_params=aux_params,
excluded_sym_names=excluded_sym_names, ctx=mx.cpu(0), calib_mode='none', quantized_dtype='uint8')
qnet = qnet.get_backend_symbol('MKLDNN_POST_QUANTIZE')
qnet = qnet.get_backend_symbol('MKLDNN_POST_FC_QUANTIZE')
print(qnet.infer_shape(data=(75, data_len, dimension)))
qnet.simple_bind(ctx=mx.cpu(0), data=(75, data_len, dimension))
When the comment is removed for the lstm or fused-rnn block, the following UserWarning is occurs.
UserWarning: Cannot decide shape for the following arguments (0s in shape means unknown dimensions). Consider providing them as input:
And, in bind time, the following error occurs.
InferShape pass cannot decide shapes for the following arguments (0s means unknown dimensions). Please consider providing them as inputs:
The same error occurs when using RNNCell. There is no error when using GRUCell. Is there a problem with how to use it?
@anirudh2290 and I tried to debug the issue. When I see the shape of gru_l0l0_begin_state_0 in the graph is (0,1024) and following by quantized_fully_connected, the zero dimension of gru_1010 is not been inferred and we need to dive deeper
@Amagong Excluding the layers with 0 dimension input will resolve this error. In your samples, the input to h2h
in the first timestep (0) of all the layers contains 0 shape, just exclude these layers as below:
For Fused-rnn block:
excluded_sym_names += ['conv0']
+excluded_sym_names += [
+ 'gru_l0l0_t0_h2h',
+ 'gru_l0l1_t0_h2h',
+ ]
For lstm block:
excluded_sym_names += ['conv0']
+excluded_sym_names += [
+ 'gru_l0l0_t0_h2h',
+ ]
For gru block, I noticed that there's another '_' in gru naming:
excluded_sym_names += ['conv0']
+excluded_sym_names += [
+ 'gru_l0l0_t0__h2h',
+ ]
Beside that, please change simple_bind() to bind() since quantized symbol requires quantized_params (int8). while simple_bind() will allocated default params which is in fp32.
-qnet.simple_bind(ctx=mx.cpu(0), data=(75, data_len, dimension))
+mod = mx.mod.Module(symbol=qnet, context=mx.cpu(0), label_names=None)
+mod.bind(data_shapes=[('data', (75, data_len, dimension))], grad_req='null')
+mod.set_params(qarg_params, qaux_params)
Hope this will help your to enable the case :)
Thanks @ciyongch . Can you please let me know why quantized_fully_connected doesn't handle inferring the data dimension 0 based on the output shape. For example, the following runs fine on fp32:
import mxnet as mx
qdtype="float32"
num_hidden=100
no_bias=False
flatten=True
x = mx.sym.var("x", dtype=qdtype)
qdata = mx.sym.Variable(name='qdata')#, shape=data_shape, dtype=qdtype)
qbias = mx.sym.Variable(name='qbias')#, shape=(10, 100), dtype=qdtype)
y = mx.sym.exp(x)
fc_fp32 = mx.sym.FullyConnected(data=qdata, num_hidden=num_hidden, no_bias=no_bias, flatten=flatten)
sum_first = mx.sym.elemwise_add(y, fc_fp32)
sum_first_1 = mx.sym.Group([sum_first, x, y])
ex = sum_first_1.simple_bind(mx.cpu(), qdata=(0, 1024), fullconnected0_weight=(100, 1024), fullyconnected0_bias=(100,), x=(10, 100))
print(ex.arg_dict["qdata"].shape)
Expectation is after quantization also it should run fine. But it fails at this check. Is there any reason why we cant remove the check here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/quantization/quantized_fully_connected.cc#L50 and add a inference from output to input like in non quantized fully connected here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/fully_connected.cc#L78
@anirudh2290 The behavior was not changed since the initial version, looks like it will throw many errors in rnn domain. Will figure out the reason and see how to improve this :)
Thanks @ciyongch I follow your guide, no more errors occur. But... There are still some problems in successfully applying Quantization to my code. I'll try various ways to apply to my code. Thank you.
@Amagong Glad to here you're able to run quantization on the sample code. Please let us know if you met other errors/failures in your real case. We're working on enhancement for this limitation..
@ciyongch In my case, there is a problem that inference time is slow when using quantization. Originally it took 2 minutes 40 seconds, it takes 24 minutes after quantization....
I generate a network like the sample code above and use the 'quantize_model' function.
# generate symbol
net = gen_sym(data_len)
net = net.get_backend_symbol('MKLDNN')
net = net.get_backend_symbol('MKLDNN_FC')
excluded_sym_names = []
excluded_sym_names += ['conv0']
excluded_sym_names += ['gru_l0l0_t0_h2h']
excluded_sym_names += ['gru_l0l1_t0_h2h']
save_dict = mx.nd.load('original_model.params')
arg_params = {}
aux_params = {}
for k, v in save_dict.items():
tp, name = k.split(':', 1)
if tp == 'arg':
arg_params[name] = v
if tp == 'aux':
aux_params[name] = v
qnet, qarg_params, qaux_params = quantize_model(sym=net, arg_params=arg_params, aux_params=aux_params,
excluded_sym_names=excluded_sym_names, ctx=mx.cpu(0), calib_mode='none', quantized_dtype='uint8')
qnet = qnet.get_backend_symbol('MKLDNN_POST_QUANTIZE')
qnet = qnet.get_backend_symbol('MKLDNN_POST_FC_QUANTIZE')
return qnet
And set parameters as below.
_, arg_params, aux_params = mx.model.load_checkpoint('quantizede model path', model_epoch_num)
model.set_params(arg_params, aux_params)
I use this structure because input data length is variable.
When I run the inference code like above, it runs without any problem, but it is too slow.... I'm looking for a problem with the my code. Can I get some advice..?
I'm using 'FusedRNNCell'
@Amagong The main reason is, you're using quantized model without calibration information. This will result in online calibration and will slow down the performance dramatically. To get full speed of quantized model, we suggest to adopt any of calib_mode(naive or entropy).
@ZhennanQin Thank you for your advice! I'll try the way you told me.
@Amagong @Soonhwan-Kwon did you get the expected results? We'd like to know some feedbacks and continuously improve the INT8 flow and quality :)
PR #15031 will fix this issue
Closing the issue since the PR is merged. Feel free to reopen if you see the issue again.
Description
When using FusedRNNCell + MKLDNN backend: Graph optimization and Quantization (experimental), it leads to the QuantizedFullyConnectedOp Error like below,
MXNetError: Error in operator quantized_fusedrnn_t134_i2h: [11:40:16] src/operator/quantization/quantized_fully_connected.cc:41: Check failed: !shape_is_none(in_shape->at(0)) QuantizedFullyConnectedOp input data shape must be given
and below is pseudo code for Network Architecture
Quantization Code net = net.get_backend_symbol('MKLDNN')
Stack trace returned 10 entries: [bt] (0) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3e95ea) [0x7faf97f2b5ea] [bt] (1) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3e9c11) [0x7faf97f2bc11] [bt] (2) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x9e351c) [0x7faf9852551c] [bt] (3) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2deed5a) [0x7faf9a930d5a] [bt] (4) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2df1704) [0x7faf9a933704] [bt] (5) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/site-packages/mxnet/libmxnet.so(MXSymbolInferShape+0x15ba) [0x7faf9a89e40a] [bt] (6) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fafff188ec0] [bt] (7) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fafff18887d] [bt] (8) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4de) [0x7fafff39f8de] [bt] (9) /home/ubuntu/anaconda2/envs/mxnet_1_4/lib/python2.7/lib-dynload/_ctypes.so(+0x9b31) [0x7fafff395b31]
commit head mxnet-cu90mkl 1.4.0.post0