Backward shape inconsistent with custom HybridBlock and gluon.loss

MoritzMaxeiner commented 6 years ago

Description

I've written a custom Gluon HybridBlock, used its output for a Gluon loss, and then tried to call loss.backward(). This works well when the block isn't hybridized, but after calling .hybridize() I get a backward shape inconsistency error.

Environment info (Required)

----------Python Info----------
Version      : 3.6.1
Compiler     : GCC 7.2.0
Build        : ('default', 'Oct 19 2017 01:04:15')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /usr/lib64/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 0.12.0
Directory    : /home/calrama/Tools/mxnet/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.14.1-gentoo-x86_64-Intel-R-_Core-TM-_i5-5200U_CPU_@_2.20GHz-with-gentoo-2.2.1
system       : Linux
node         : cylae
release      : 4.14.1-gentoo
version      : #4 SMP PREEMPT Fri Nov 24 01:23:46 CET 2017
----------Hardware Info----------
machine      : x86_64
processor    : Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Stepping:            4
CPU MHz:             2195.012
CPU max MHz:         2700.0000
CPU min MHz:         500.0000
BogoMIPS:            4390.02
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            3072K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0007 sec, LOAD: 0.8002 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0003 sec, LOAD: 0.2168 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0003 sec, LOAD: 0.7772 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0002 sec, LOAD: 0.7524 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0002 sec, LOAD: 0.0506 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0004 sec, LOAD: 0.0704 sec.

Package used (Python/R/Scala/Julia): I'm using Python.

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): gcc 7.2.0

MXNet commit hash: c1846ce2ca5003cf613c8bdcf5b0c89d8e0b0d67

Build config:

make USE_BLAS=openblas -j5

#-------------------------------------------------------------------------------
#  Template configuration for compiling mxnet
#
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory of mxnet. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ cp make/config.mk .
#
#  Next modify the according entries, and then compile by
#
#  $ make
#
#  or build in parallel with 8 threads
#
#  $ make -j8
#-------------------------------------------------------------------------------

#---------------------
# choice of compiler
#--------------------

export CC = gcc
export CXX = g++
export NVCC = nvcc

# whether compile with options for MXNet developer
DEV = 0

# whether compile with debug
DEBUG = 0

# whether compile with profiler
USE_PROFILER =

# whether to turn on signal handler (e.g. segfault logger)
USE_SIGNAL_HANDLER =

# the additional link flags you want to add
ADD_LDFLAGS =

# the additional compile flags you want to add
ADD_CFLAGS = -I/usr/include/openblas

#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 0

# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = NONE

# whether use CuDNN R3 library
USE_CUDNN = 0

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1

#whether use libjpeg-turbo for image decode without OpenCV wrapper
USE_LIBJPEG_TURBO = 0
#add the path to libjpeg-turbo library
USE_LIBJPEG_TURBO_PATH = NONE

# use openmp for parallelization
USE_OPENMP = 1

# MKL ML Library for Intel CPU/Xeon Phi
# Please refer to MKL_README.md for details

# MKL ML Library folder, need to be root for /usr/local
# Change to User Home directory for standard user
# For USE_BLAS!=mkl only
MKLML_ROOT=/usr/local

# whether use MKL2017 library
USE_MKL2017 = 0

# whether use MKL2017 experimental feature for high performance
# Prerequisite USE_MKL2017=1
USE_MKL2017_EXPERIMENTAL = 0

# whether use NNPACK library
USE_NNPACK = 0

# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
# in default use atlas for linux while apple for osx
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif

# whether use lapack during compilation
# only effective when compiled with blas versions openblas/apple/atlas/mkl
USE_LAPACK = 1

# path to lapack library in case of a non-standard installation
USE_LAPACK_PATH =

# add path to intel library, you may need it for MKL, if you did not add the path
# to environment variable
USE_INTEL_PATH = NONE

# If use MKL only for BLAS, choose static link automatically to allow python wrapper
ifeq ($(USE_MKL2017), 0)
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
endif
else
USE_STATIC_MKL = NONE
endif

#----------------------------
# Settings for power and arm arch
#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
    USE_SSE=0
else
    USE_SSE=1
endif

#----------------------------
# distributed computing
#----------------------------

# whether or not to enable multi-machine supporting
USE_DIST_KVSTORE = 0

# whether or not allow to read and write HDFS directly. If yes, then hadoop is
# required
USE_HDFS = 0

# path to libjvm.so. required if USE_HDFS=1
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

# whether or not allow to read and write AWS S3 directly. If yes, then
# libcurl4-openssl-dev is required, it can be installed on Ubuntu by
# sudo apt-get install -y libcurl4-openssl-dev
USE_S3 = 0

# Use gperftools if found
USE_GPERFTOOLS = 1

# Use JEMalloc if found, and not using gperftools
USE_JEMALLOC = 1

#----------------------------
# additional operators
#----------------------------

# path to folders containing projects specific operators that you don't want to put in src/operators
EXTRA_OPERATORS =

#----------------------------
# other features
#----------------------------

# Create C++ interface package
USE_CPP_PACKAGE = 1

#----------------------------
# plugins
#----------------------------

# whether to use caffe integration. This requires installing caffe.
# You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
# CAFFE_PATH = $(HOME)/caffe
# MXNET_PLUGINS += plugin/caffe/caffe.mk

# whether to use torch integration. This requires installing torch.
# You also need to add TORCH_PATH/install/lib to your LD_LIBRARY_PATH
# TORCH_PATH = $(HOME)/torch
# MXNET_PLUGINS += plugin/torch/torch.mk

# WARPCTC_PATH = $(HOME)/warp-ctc
# MXNET_PLUGINS += plugin/warpctc/warpctc.mk

# whether to use sframe integration. This requires build sframe
# git@github.com:dato-code/SFrame.git
# SFRAME_PATH = $(HOME)/SFrame
# MXNET_PLUGINS += plugin/sframe/plugin.mk

Error Message:

[18:46:34] src/executor/infer_graph_attr_pass.cc:212: Check failed: (rshape[eid]) == (rshape[idx.entry_id(fnode.inputs[i])]) Backward shape inconsistent with the forward shape fish: “python test.py” terminated by signal SIGABRT (Abort)

Minimum reproducible example

# test.py
import mxnet as mx

class Test(mx.gluon.HybridBlock):
    def __init__(self, hidden_unit_size, seq_length, feature_size, output_size, weight_initializer=None, **kwargs):
        super(Test, self).__init__(**kwargs)
        self.seq_length = seq_length
        self.feature_size = feature_size
        self.hidden_unit_size = hidden_unit_size
        self.output_size = output_size

        self.num_cells = 2
        with  self.name_scope():
            self.cell_a = mx.gluon.rnn.GRUCell(self.hidden_unit_size, input_size=feature_size)
            self.cell_b = mx.gluon.rnn.GRUCell(self.hidden_unit_size, input_size=hidden_unit_size)

    def hybrid_forward(self, F, inputs, states):
        prev_hidden_states = states[0]
        if F is mx.symbol:
            prev_hidden_states = F.split(prev_hidden_states, axis=0, num_outputs=self.num_cells, squeeze_axis=1)

        cell_a_outputs, _ = self.cell_a.unroll(self.seq_length, inputs, [prev_hidden_states[0]])

        cell_b_inputs = [prev_hidden_states[0]] + cell_a_outputs[:-1]
        cell_b_outputs, _ = self.cell_b.unroll(self.seq_length, cell_b_inputs, [prev_hidden_states[1]])

        a_outputs = F.concat(*[F.reshape(a, shape=(0,1,self.feature_size, self.output_size)) for a in cell_a_outputs], dim=1)
        b_outputs = cell_b_outputs[0]

        return a_outputs, b_outputs

    def state_info(self, batch_size=0):
        return [{'shape': (self.num_cells, batch_size, self.hidden_unit_size), '__layout__': 'LNC'}]

    def begin_state(self, batch_size=0, func=mx.ndarray.zeros, **kwargs):
        states = []
        for i, info in enumerate(self.state_info(batch_size)):
            if info is not None:
                info.update(kwargs)
            else:
                info = kwargs
            states.append(func(name='%sh0_%d'%(self.prefix, i), **info))
        return states

args_nof_examples = 1
args_seq_len = 10
args_feature_size = 1
args_output_size = 1
args_nof_batches = 1
args_batch_size = 1

hidden_unit_size = args_feature_size * args_output_size

data = mx.ndarray.zeros(shape=(args_nof_examples, args_seq_len, args_feature_size))
labels = mx.ndarray.ones((args_nof_examples, args_seq_len, args_feature_size))
generator = mx.io.NDArrayIter(data, labels, args_batch_size, last_batch_handle='discard')

with mx.cpu(0) as context:
    model = Test(hidden_unit_size, args_seq_len, args_feature_size, args_output_size)
    model.initialize(mx.init.Xavier(), ctx = context)
    model.hybridize()

    loss = mx.gluon.loss.SoftmaxCrossEntropyLoss()

    states = model.begin_state(args_batch_size)
    for batch in generator:
        with mx.autograd.record():
            dis, gen = model(batch.data[0], states)
            L = loss(dis, batch.label[0])
        L.backward()

Steps to reproduce

python test.py

What have you tried to solve it?

Minimized from real world problem to the above minimal example in order to understand if I'm doing something wrong.

MoritzMaxeiner commented 6 years ago

After further experimentation it seems that the reshape operation

F.reshape(a, shape=(0,1,self.feature_size, self.output_size)

is the issue here, as it works fine when I move it outside of the hybrid_forward and perform it on the NDArray result.

reminisce commented 6 years ago

Seems like some problem of memory invalid access. With the latest master branch code, the example gives a seg fault on Ubuntu without any error message, but on Mac it could run through. Definitely some undefined behavior going under the hood.

MoritzMaxeiner commented 6 years ago

@reminisce Hm, I've tried out 1.0.0rc0 now and so far I haven't been able to reproduce the issue in that version. If there's an issue in master, I'd assume that to be a (separate) regression?

reminisce commented 6 years ago

I think there is an undefined behavior of backend (C++) code, as the example behaves differently on different platforms with the latest master branch code. It would be very helpful for us to debug if you could simplify the example as much as possible based upon the commit c1846ce.

MoritzMaxeiner commented 6 years ago

The issue is also present in 0.12.1. @reminisce I'll try to reduce it further.

MoritzMaxeiner commented 6 years ago

@reminisce I've removed the time unrolling (and the issue is still being triggered), but if I remove either of the two cells, or the reshape operation, the issue won't arise, so I don't think I can reduce it any further.

import mxnet as mx

class Test(mx.gluon.HybridBlock):
    def __init__(self, input_size, output_size, **kwargs):
        super(Test, self).__init__(**kwargs)
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_unit_size = output_size*input_size

        self.num_cells = 2
        with self.name_scope():
            self.cell_a = mx.gluon.rnn.GRUCell(self.hidden_unit_size, input_size=input_size)
            self.cell_b = mx.gluon.rnn.GRUCell(self.hidden_unit_size, input_size=self.hidden_unit_size)

    def hybrid_forward(self, F, inputs, states):
        prev_h = states[0]
        if F is mx.symbol:
            prev_h = F.split(prev_h, axis=0, num_outputs=self.num_cells, squeeze_axis=1)

        cell_a_next_h, _ = self.cell_a(inputs, [prev_h[0]])

        cell_b_next_h, _ = self.cell_b(prev_h[1], [prev_h[1]])

        b_output = cell_b_next_h.reshape(shape=(0, self.input_size, self.output_size))

        return cell_a_next_h, b_output, []

    def state_info(self, batch_size=0):
        return [{'shape': (self.num_cells, batch_size, self.hidden_unit_size), '__layout__': 'LNC'}]

    def begin_state(self, batch_size=0, func=mx.ndarray.zeros, **kwargs):
        states = []
        for i, info in enumerate(self.state_info(batch_size)):
            if info is not None:
                info.update(kwargs)
            else:
                info = kwargs
            states.append(func(name='%sh0_%d'%(self.prefix, i), **info))
        return states

args_nof_examples = 1
args_nof_batches = 1
args_batch_size = 1

args_input_size = 1
args_output_size = 1

data = mx.ndarray.zeros(shape=(args_nof_examples, args_input_size))
labels = mx.ndarray.ones((args_nof_examples, args_input_size))
gen = mx.io.NDArrayIter(data, labels, args_batch_size, last_batch_handle='discard')

with mx.cpu(0) as context:
    model = Test(args_input_size, args_output_size)
    model.initialize(mx.init.Xavier(), ctx = context)
    model.hybridize()

    loss = mx.gluon.loss.SoftmaxCrossEntropyLoss()

    states = model.begin_state(args_batch_size)
    for batch in gen:
        with mx.autograd.record():
            a, b, _ = model(batch.data[0], states)
            L = loss(b, batch.label[0])
        L.backward()

szha commented 6 years ago

The above script didn't trigger any error on my side. I'm using recent commit 3c32f765

SuperLinguini commented 6 years ago

Proposed Label: "Python", "Bug", "Gluon"

MoritzMaxeiner commented 6 years ago

For what it's worth, I don't have this issue with MXNet 1.0.0 nor 1.1.0, but I don't know if that's because the root cause (which is unknown to me) has been fixed, or if it just doesn't get triggered anymore, so I'm hesitant to close.

sxjscience commented 6 years ago

@MoritzMaxeiner The issue somehow does not exist any more and I'm going to close it. However, the root cause is still unclear.

@reminisce Do you have any idea about it?

reminisce commented 6 years ago

@MoritzMaxeiner Is the result produced by your script expected?

MoritzMaxeiner commented 6 years ago

@reminisce I'm unsure as to what you're asking, specifically. The script should ideally have terminated with exit code 0 and no stdout/stderr output, which is not what happened, so that was unexpected for me.

reminisce commented 6 years ago

@MoritzMaxeiner Sorry should have made question clearer. I wanted to know that if you could reproduce the issue every time using the latest code/release. If not, when it does not crash, is the numerical result expected? This information would be helpful for us to find out whether it's an error of the implementation of the logic or some careless typo resulting in invalid memory access.

MoritzMaxeiner commented 6 years ago

@reminisce Ah, ok. W.r.t. reproducing: I haven't encountered the issue in 1.0 or 1.1, so no I can't reproduce it in lastest code/release. Concerning the numerical result: That's hard to say precisely as it involves plently of (inherently inaccurate) floating point math (so I can't just calculate the operations another way and compare the results). If a guestimate is of use to you: The RNN trains and predicts as I would expect it to (in 1.0 and 1.1).

reminisce commented 6 years ago

@MoritzMaxeiner Thanks for the answers. Since the latest code is working fine, can we close the ticket for now? Please feel free to reopen it once it appears again.

MoritzMaxeiner commented 6 years ago

@reminisce Sure, I just didn't close for the reasons mentioned here and here.

vandanavk commented 6 years ago

@sandeep-krishnamurthy Please close this issue as it is not reproducible on latest code.

@MoritzMaxeiner @sxjscience Please feel free to reopen this issue if you see it again.

apache / mxnet