apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

HybridBlock/SymbolBlock import-export does not persist input/output formats #17488

Open canerturkmen opened 4 years ago

canerturkmen commented 4 years ago

Description

This is a combination of two small issues with the HybridBlock.export and SymbolBlock.imports methods.

  1. When serializing/deserializing hybridized block objects, the _in_format and _out_format attributes are not persisted. Especially when dealing with nested inputs, this leads to some unexpected behavior.
  2. When blocks with no parameters are exported, a .params file is written out. However when the same empty params file is fed back to imports, the method fails somewhat obscurely.

To Reproduce

import mxnet as mx

class TestBlock(mx.gluon.HybridBlock):
    def hybrid_forward(self, F, x1, x2):
        return F.broadcast_mul(x1, x2[0])

my_block = TestBlock()

my_block.hybridize()
my_block(
    mx.nd.array([1, 2]), [mx.nd.array([3, 4])]
)

my_block.export("block")

# ISSUE 1: 
# there is no way to relay the _in_format here as it's not persisted
# with SymbolBlock
sym_block = mx.gluon.SymbolBlock.imports(
    "block-symbol.json", ["x1", "x2"]
)

assert my_block._in_format == [0, [0]]
assert sym_block._in_format == [0, 0]

# ISSUE 2:
# the hybrid block export writes a "block-0000.params" file regardless
# of whether there are any parameters in the name scope. However, when
# the same file is fed to the `imports` class method, it throws an
# exception
sym_block = mx.gluon.SymbolBlock.imports(
    "block-symbol.json", ["x1", "x2"], "block-0000.params"
)

Error Message

(For ISSUE 2 above)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-bdc63b446461> in <module>
      5 # exception
      6 sym_block = mx.gluon.SymbolBlock.imports(
----> 7     "block-symbol.json", ["x1", "x2"], "block-0000.params"
      8 )

~/VENVS/GluonTS/lib/python3.6/site-packages/mxnet/gluon/block.py in imports(symbol_file, input_names, param_file, ctx)
   1022         ret = SymbolBlock(sym, inputs)
   1023         if param_file is not None:
-> 1024             ret.collect_params().load(param_file, ctx=ctx)
   1025         return ret
   1026 

~/VENVS/GluonTS/lib/python3.6/site-packages/mxnet/gluon/parameter.py in load(self, filename, ctx, allow_missing, ignore_extra, restore_prefix)
    902         lprefix = len(restore_prefix)
    903         loaded = [(k[4:] if k.startswith('arg:') or k.startswith('aux:') else k, v) \
--> 904                   for k, v in ndarray.load(filename).items()]
    905         arg_dict = {restore_prefix+k: v for k, v in loaded}
    906         if not allow_missing:

AttributeError: 'list' object has no attribute 'items'

What have you tried to solve it?

We're handling it in gluon-ts by explicitly writing and reading input/output formats. See the issue.

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

----------Python Info----------
Version      : 3.6.5
Compiler     : GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.4)
Build        : ('default', 'Aug 19 2019 21:45:20')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.3
Directory    : /Users/caner/VENVS/GluonTS/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.4.1
Directory    : /Users/caner/VENVS/GluonTS/lib/python3.6/site-packages/mxnet
An error occured trying to import mxnet.
This is very likely due to missing missing or incompatible library files.
Traceback (most recent call last):
  File "<stdin>", line 122, in check_mxnet
AttributeError: module 'mxnet.util' has no attribute 'get_gpu_count'

----------System Info----------
Platform     : Darwin-17.7.0-x86_64-i386-64bit
system       : Darwin
node         : caner.local
release      : 17.7.0
version      : Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0269 sec, LOAD: 1.1014 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0007 sec, LOAD: 0.7569 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0856 sec, LOAD: 0.7301 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0438 sec, LOAD: 0.0769 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0408 sec, LOAD: 0.4508 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0946 sec, LOAD: 0.9294 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0218 sec, LOAD: 0.7639 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0423 sec, LOAD: 0.3948 sec.
eric-haibin-lin commented 4 years ago

@sxjscience FYI

eric-haibin-lin commented 4 years ago

Is this fixed already?

sxjscience commented 4 years ago

@eric-haibin-lin It's not fixed. For the first issue, the problem is that the serialized json file has not stored the in_format.

szha commented 4 years ago

cc @leezu

eric-haibin-lin commented 4 years ago

BTW - @canerturkmen thank you for reporting this.