Cannot train SmileCNN with Keras-MXNet

aaronmarkham commented 6 years ago

Description

Error during training of the SmileCNN demo using Keras-MXNet. (Error in operator conv2d_1/conv2d1)

Moving this from https://github.com/apache/incubator-mxnet/issues/11645

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.15')
('Compiler     :', 'GCC 7.2.0')
('Build        :', ('default', 'May  1 2018 23:32:55'))
('Arch         :', ('64bit', ''))
------------Pip Info-----------
('Version      :', '10.0.1')
('Directory    :', '/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version      :', '1.2.0')
('Directory    :', '/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet')
('Commit Hash   :', '297c64fd2ee404612aa3ecc880b940fb2538039c')
----------System Info----------
('Platform     :', 'Linux-4.4.0-1061-aws-x86_64-with-debian-stretch-sid')
('system       :', 'Linux')
('node         :', 'ip-172-31-80-156')
('release      :', '4.4.0-1061-aws')
('version      :', '#70-Ubuntu SMP Fri May 25 21:47:34 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1567.144
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.13
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 sec, LOAD: 0.5112 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0026 sec, LOAD: 0.0928 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0443 sec, LOAD: 0.1342 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0029 sec, LOAD: 0.0329 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2468 sec, LOAD: 0.3899 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1114 sec, LOAD: 0.4641 sec.

Package used (Python/R/Scala/Julia):

pip install mxnet-cu90 (this installed 1.2.0) pip install keras-mxnet

Error Message:

/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py:89: UserWarning: MXNet Backend performs best with channels_first format. Using channels_last will significantly reduce performance due to the Transpose operations. For performance improvement, please use this APIkeras.utils.to_channels_first(x_input)to transform channels_last data to channels_first format and also please change the image_data_format in keras.json to channels_first.Note: x_input is a Numpy tensor or a list of Numpy tensorRefer to: https://github.com/awslabs/keras-apache-mxnet/tree/master/docs/mxnet_backend/performance_guide.md train_symbol = func(args, kwargs) Traceback (most recent call last): File "train.py", line 39, in model.add(Conv2D(nb_filters, (nb_conv, nb_conv), activation='relu', input_shape=X.shape[1:])) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/engine/sequential.py", line 166, in add layer(x) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/engine/base_layer.py", line 460, in call output = self.call(inputs, kwargs) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/layers/convolutional.py", line 172, in call dilation_rate=self.dilation_rate) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 3136, in conv2d padding_mode=padding, data_format=data_format) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 89, in func_wrapper train_symbol = func(args, kwargs) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 4443, in _convnd result = _postprocess_convnd_output(KerasSymbol(conv), data_format) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 81, in func_wrapper train_symbol = func(*args, *kwargs) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 4180, in _postprocess_convnd_output if data_format == 'channels_last' and ndim(x) > 3: File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 493, in ndim shape = x.shape File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 3820, in shape return self._get_shape() File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/keras/backend/mxnet_backend.py", line 3829, in _getshape , outshape, = self.symbol.infer_shape_partial() File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1062, in infer_shape_partial return self._infer_shape_impl(True, args, kwargs) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1120, in _infer_shape_impl ctypes.byref(complete))) File "/home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Error in operator conv2d_1/conv2d1: [15:49:12] src/operator/nn/convolution.cc:191: Check failed: dilated_ksizey <= AddPad(dshape[2], param.pad[0]) (3 vs. 1) kernel size exceed input

Stack trace returned 10 entries: [bt] (0) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x30cbe2) [0x7f5d81054be2] [bt] (1) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x30d1b8) [0x7f5d810551b8] [bt] (2) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x561afd) [0x7f5d812a9afd] [bt] (3) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x299b76f) [0x7f5d836e376f] [bt] (4) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x299e25f) [0x7f5d836e625f] [bt] (5) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(MXSymbolInferShape+0x1549) [0x7f5d83664169] [bt] (6) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/site-packages/mxnet/libmxnet.so(MXSymbolInferShapePartial+0x82) [0x7f5d83665922] [bt] (7) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f5db356eec0] [bt] (8) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f5db356e87d] [bt] (9) /home/ubuntu/anaconda3/envs/python2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4d6) [0x7f5db57848d6]

Minimum reproducible example

Following https://github.com/kalyc/SmileCNN

Steps to reproduce

Follow instructions using a Python 2 environment
Will fail at python train.py step.

kalyc commented 6 years ago

This has got to do with installing keras-mxnet. For Keras-MXNet-2.2.0, pip install is not supported as the right dependencies are not being pulled in. So, we need to build keras-mxnet from source. We also need to build mxnet from source.

kalyc commented 6 years ago

Could replicate the issue after following installation as mentioned in the above comment.

Please update the configuration json vi $HOME/.keras/keras.json & set data format to "channels_first", then rebuild keras-apache-mxnet. The above error will be resolved. SmileCNN works only with channels_first data format.

@aaronmarkham please note and close this issue if following the above steps fixes the issue.

roywei commented 6 years ago

@kalyc pip installation works fine from user's perspective, only thing here is to use channels_first data format.

The pip installation issue is that it will install official keras first, and then install keras-mxnet to override keras. But everything is working fine. I have updated the issue

kalyc commented 6 years ago

Thanks for the clarification @roywei

lupesko commented 6 years ago

What is the conclusion here? Should the example under kalyc be updated? Should this issue be closed?

roywei commented 6 years ago

No need to re install/build anything. Just change the json config at ~/.keras/keras.json to use channels_first data format. @aaronmarkham I would like to confirm it's working for you before closing this issue. Thanks!

kalyc commented 6 years ago

@aaronmarkham could you confirm here if the issue can be closed?

kalyc commented 6 years ago

Have verified that changing the data format fixes the issue. Closing. @aaronmarkham please feel free to reopen if you have made an observation otherwise.

awslabs / keras-apache-mxnet