apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Passing parameters to HybridBlocks and not using them #14373

Open whamza15 opened 5 years ago

whamza15 commented 5 years ago

Description

Not using variables in hybrid_forward() causes deferred initialization to fail. There is no requirement that one should use ALL passed input. I am not sure why it failed to infer the input shape for the dense layer. It works fine without hybridize of course. The reason we are passing input data to blocks without using them is because some subclasses uses them and we would like to unify the interface so calling blocks do not have to be aware of what type of blocks they are calling. We cannot use __call__() or forward() since these blocks will be hybridized and served from C++.

Environment info (Required)

----------Python Info----------
Version      : 3.6.7
Compiler     : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build        : ('default', 'Oct 23 2018 14:01:38')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.0
Directory    : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.1
Directory    : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash   : 19c501680183237d52a862e6ae1dc4ddc296305b
----------System Info----------
Platform     : Darwin-16.7.0-x86_64-i386-64bit
system       : Darwin
node         : 88e9fe531e66.ant.amazon.com
release      : 16.7.0
version      : Darwin Kernel Version 16.7.0: Thu Dec 20 21:53:35 PST 2018; root:xnu-3789.73.31~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0196 sec, LOAD: 0.5490 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0161 sec, LOAD: 0.6451 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0291 sec, LOAD: 0.5838 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0175 sec, LOAD: 0.7988 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0163 sec, LOAD: 0.3659 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0172 sec, LOAD: 0.1020 sec.

Error Message:

---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
    804             cargs = [args[i] if is_arg else i.data()
--> 805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
    804             cargs = [args[i] if is_arg else i.data()
--> 805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    493                                "instead." % (self.name, str(ctx), self._stype))
--> 494         return self._check_and_get(self._data, ctx)
    495 

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    207                 "You can also avoid deferred initialization by specifying in_units, " \
--> 208                 "num_features, etc., for network layers."%(self.name))
    209         raise RuntimeError(

DeferredInitializationError: Parameter 'dense4_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

MXNetError                                Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    790         try:
--> 791             self.infer_shape(*args)
    792         except Exception as e:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
    863         """Infers shape of Parameters from inputs."""
--> 864         self._infer_attrs('infer_shape', 'shape', *args)
    865 

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
    852             arg_attrs, _, aux_attrs = getattr(out, infer_fn)(
--> 853                 **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
    854             if arg_attrs is None:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in infer_shape(self, *args, **kwargs)
    995         try:
--> 996             res = self._infer_shape_impl(False, *args, **kwargs)
    997             if res[1] is None:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in _infer_shape_impl(self, partial, *args, **kwargs)
   1125             ctypes.byref(aux_shape_data),
-> 1126             ctypes.byref(complete)))
   1127         if complete.value != 0:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
    250     if ret != 0:
--> 251         raise MXNetError(py_str(_LIB.MXGetLastError()))
    252 

MXNetError: [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
    [0]data0
    [1]embedding8_weight
    [2]data2
    [3]embedding9_weight
    [4]dense4_weight
    [5]dense4_bias

Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1   libmxnet.so                         0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2   libmxnet.so                         0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010b7ca884 ffi_call_unix64 + 76

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-10-680dd178ea34> in <module>()
      4 vl2 = mx.nd.array([3,2])
      5 
----> 6 net(x1, vl1, x2, vl2)

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    540             hook(self, args)
    541 
--> 542         out = self.forward(*args)
    543 
    544         for hook in self._forward_hooks.values():

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
    907             with x.context as ctx:
    908                 if self._active:
--> 909                     return self._call_cached_op(x, *args)
    910 
    911                 try:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
    805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:
--> 807             self._deferred_infer_shape(*args)
    808             cargs = []
    809             for is_arg, i in self._cached_op_args:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    793             error_msg = "Deferred initialization failed because shape"\
    794                         " cannot be inferred. {}".format(e)
--> 795             raise ValueError(error_msg)
    796 
    797     def _call_cached_op(self, *args):

ValueError: Deferred initialization failed because shape cannot be inferred. [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
    [0]data0
    [1]embedding8_weight
    [2]data2
    [3]embedding9_weight
    [4]dense4_weight
    [5]dense4_bias

Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1   libmxnet.so                         0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2   libmxnet.so                         0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010b7ca884 ffi_call_unix64 + 76

Minimum reproducible example

import mxnet.gluon as gl
import mxnet as mx

class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)

class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, flatten=False)
        self.e1 = EmbeddingBlock(10,100)
        self.e2 = EmbeddingBlock(20,60)

    def hybrid_forward(self, F, x1, vl1, x2, vl2):
        o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
        return self.dense(o)

net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1, x2, vl2)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. just put the above in a script and run it

What have you tried to solve it?

The only solutions that works is to use the unused variables in the graph in a redundant way.

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Feature

piyushghai commented 5 years ago

Looks like a possible bug to me. I'm labelling it so that the MXNet community can help resolve it.

@mxnet-label-bot Add [Bug, Gluon]

abhinavs95 commented 5 years ago

@whamza15 This is not an issue of not using all variables in hybrid_forward as the following test works

import mxnet.gluon as gl
import mxnet as mx

class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)

net = EmbeddingBlock(10, 100)
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1)
print(net.collect_params())

EDIT: The above test works because deferred initialization is not used for embedding layers. For layers using deferred initialization like nn.dense the issue exists as can be verified using the following:

class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, flatten=False)

    def hybrid_forward(self, F, x, v1):
        return self.dense(x)

net = Net()
net.initialize()
net.hybridize()
x = mx.nd.array(range(8)).reshape(2,-1)
v1 = mx.nd.array([3,2])
net(x, v1)

Error Message:

/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py:540: UserWarning: The 1-th input to HybridBlock is not used by any computation. Is this intended?
  out = self.forward(*args)
infer_shape error. Arguments:
  data0: (2, 4)
  data1: (2,)
Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in _call_cached_op
    for is_arg, i in self._cached_op_args]
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in <listcomp>
    for is_arg, i in self._cached_op_args]
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 494, in data
    return self._check_and_get(self._data, ctx)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 208, in _check_and_get
    "num_features, etc., for network layers."%(self.name))
mxnet.gluon.parameter.DeferredInitializationError: Parameter 'dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 789, in _deferred_infer_shape
    self.infer_shape(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 862, in infer_shape
    self._infer_attrs('infer_shape', 'shape', *args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 851, in _infer_attrs
    **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
  File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 996, in infer_shape
    res = self._infer_shape_impl(False, *args, **kwargs)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1126, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/anaconda3/lib/python3.7/site-packages/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
    [0]data0
    [1]dense0_weight
    [2]dense0_bias

Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1   libmxnet.so                         0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2   libmxnet.so                         0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010a0b1884 ffi_call_unix64 + 76

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_gl1.py", line 28, in <module>
    net(x, v1)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 907, in forward
    return self._call_cached_op(x, *args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 805, in _call_cached_op
    self._deferred_infer_shape(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 793, in _deferred_infer_shape
    raise ValueError(error_msg)
ValueError: Deferred initialization failed because shape cannot be inferred. [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
    [0]data0
    [1]dense0_weight
    [2]dense0_bias

Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1   libmxnet.so                         0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2   libmxnet.so                         0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010a0b1884 ffi_call_unix64 + 76
abhinavs95 commented 5 years ago

I am trying to figure out if this is actually a bug and if there is a possible workaround for this usecase.

@sandeep-krishnamurthy @safrooze Could you please have a look?

abhinavs95 commented 5 years ago

Possibly related to #13967

abhinavs95 commented 5 years ago

It seems like this is expected behavior, @eric-haibin-lin could you have a look and confirm?

@whamza15 Since the error pops up due to deferred initialization, you can avoid it by specifying the input shape when creating the layers. Here is the full example:

import mxnet.gluon as gl
import mxnet as mx

class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)

class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, in_units=160, flatten=False)
        self.e1 = EmbeddingBlock(10,100)
        self.e2 = EmbeddingBlock(20,60)

    def hybrid_forward(self, F, x1, vl1, x2, vl2):
        o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
        return self.dense(o)

net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1, x2, vl2)
abhinavs95 commented 5 years ago

@mxnet-label-bot add [pending requester info]

piyushghai commented 5 years ago

@whamza15 Did these suggestions help you ?

piyushghai commented 5 years ago

@whamza15 Can you please close the issue if it has been resolved for you ?

Please feel free to re-open if closed in error.

whamza15 commented 5 years ago

Sorry, I did not get a chance to follow up on this. I can try what you described @abhinavs95. However, not using deferred initialization is going to be a bit of a set back in our toolkit that relies so much on that. Is there a possibility this can be solved and still rely on deferred initialization?

whamza15 commented 5 years ago

I just want to add that if I use valid_length in the EmbeddingBlock, it works fine even with deferred initialization.

eric-haibin-lin commented 5 years ago

@whamza15 does it work if you pass [] as the value for valid_length?

whamza15 commented 5 years ago

@eric-haibin-lin I am not sure I understand the question. valid_length always has value. It is just that this block does not use it. The reason we have this setup is that our toolkit allow people configure blocks (as complex as they want) without having to changing the input. Some blocks may choose to consume valid_length (like complex encoders) while others may choose not to (like simple embedding block).

eric-haibin-lin commented 5 years ago

We have a temporary workaround in https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/transformer.py#L420-L501 but this bug should definitely fixed in MXNet

RuRo commented 4 years ago

There is a similar problem when there are unused parameters. For example, you can have a model like this:

class Test(mx.gluon.nn.HybridBlock): 
    def __init__(self, mode, *args, **kwargs): 
        super().__init__(*args, **kwargs) 
        self.mode = mode 
        with self.name_scope(): 
            self.d1 = mx.gluon.nn.Dense(2) 
            self.d2 = mx.gluon.nn.Dense(3) 

    def hybrid_forward(self, F, x, *args, **kwargs): 
        o1 = self.d1(x) 
        o2 = self.d2(x) 
        if self.mode: 
            return o1 # output path o2 is not used
        else: 
            return o1, o2 

Currently, this model will not hybridize successfully, when mode == True, because the weights in the o2 path are "unused".

```python /usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: Parameter test4_dense1_weight, test4_dense1_bias is not used by any computation. Is this intended? out = self.forward(*args) --------------------------------------------------------------------------- DeferredInitializationError Traceback (most recent call last) /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args) 1012 try: -> 1013 cargs = [args_without_none[i] if is_arg else i.data() 1014 for is_arg, i in self._cached_op_args] /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in (.0) 1012 try: -> 1013 cargs = [args_without_none[i] if is_arg else i.data() 1014 for is_arg, i in self._cached_op_args] /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx) 564 "instead." % (self.name, str(ctx), self._stype)) --> 565 return self._check_and_get(self._data, ctx) 566 /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx) 230 if self._deferred_init: --> 231 raise DeferredInitializationError( 232 "Parameter '%s' has not been initialized yet because initialization was " \ DeferredInitializationError: Parameter 'test4_dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers. During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args) 973 try: --> 974 self.infer_shape(*args) 975 except Exception as e: /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, *args) 1074 """Infers shape of Parameters from inputs.""" -> 1075 self._infer_attrs('infer_shape', 'shape', *args) 1076 /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args) 1070 for i in self.collect_params().values(): -> 1071 setattr(i, attr, sdict[i.name]) 1072 KeyError: 'test4_dense1_weight' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) in ----> 1 t(mx.nd.array([10])) /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, *args) 692 hook(self, args) 693 --> 694 out = self.forward(*args) 695 696 for hook in self._forward_hooks.values(): /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, *args) 1150 'Find all contexts = {}'.format(ctx_set)) 1151 with ctx: -> 1152 return self._call_cached_op(x, *args) 1153 with ctx: 1154 try: /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args) 1014 for is_arg, i in self._cached_op_args] 1015 except DeferredInitializationError: -> 1016 self._deferred_infer_shape(*args) 1017 cargs = [] 1018 for is_arg, i in self._cached_op_args: /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args) 976 error_msg = "Deferred initialization failed because shape"\ 977 " cannot be inferred. {}".format(e) --> 978 raise ValueError(error_msg) 979 980 def _call_cached_op(self, *args): ValueError: Deferred initialization failed because shape cannot be inferred. 'test4_dense1_weight' ```

Having unused parameters is useful since you might want your pretrain/finetune/evaluation networks to behave differently, but be compatible for .save_parameters and .load_parameters without allow_missing and ignore_extra.

I think this issue could be fixed without changing the inner workings too much by adding a F.nodiscard(o2) operator. It would be a no-op in nd mode and would somehow mark the output as a required computation during sym mode. Not sure, how feasible something like that is.

My current workaround is something like

        return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not used

which is both really ugly and potentially inefficient, since it forces the unneeded computation.

If the F.nodiscard option is too hard to implement, something like

o1 = F.depends_on(o1, o2)

could also work. It would basically be the same as F.broadcast_add(o1, F.sum(0.0 * o2)) but without any computations.

szha commented 4 years ago

cc @leezu

whamza15 commented 4 years ago

Any progress on this?

szha commented 4 years ago

@whamza15 this will be taken into account in the MXNet 2.0 roadmap item 4.3, Gluon block enhancement, that @leezu is driving.