apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Conv2d fails to run NHWC on cpu #21176

Open Embed-Debuger opened 1 year ago

Embed-Debuger commented 1 year ago

Description

On cpu devices, Conv2D does not seem to be able to run the NHWC format

Error Message

Traceback (most recent call last): File "", line 1, in File "/root/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/root/gejie/program/Medi-Test/Repreduce issue/mx1.9.1_conv_cb.py", line 20, in print(result) File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 257, in repr return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()), File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 2571, in asnumpy ctypes.c_size_t(data.size))) File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/base.py", line 246, in check_call raise get_last_ffi_error() mxnet.base.MXNetError: MXNetError: could not create a descriptor for a dilated convolution forward propagation primitive

To Reproduce

import mxnet as mx
from mxnet import nd            # Tensor模块
from mxnet.gluon import nn      # 神经网络基本结构
from mxnet.gluon.nn import Conv2D

import os
os.environ['DMLC_LOG_STACK_TRACE_DEPTH'] = "100"

def Model():
    net = nn.Sequential()
    net.add(Conv2D(channels=32, kernel_size=(5, 5), layout="NHWC"))
    net.initialize(ctx=mx.cpu())
    return net

shape = (10,32,32,3)
model = Model()
data = nd.random.uniform(-1, 1, shape, ctx=mx.cpu())
result = model(data)
print(result)

Steps to reproduce

(Paste the commands you ran that produced the error.)

Using the above code, the conv2D operator in NHWC format cannot be run on the cpu. And the error message does not clearly state that it is caused by the NHWC format, which will mislead me and make it difficult to locate.

What have you tried to solve it?

  1. We want to run Conv2d in NHWC format on a cpu device

Environment

We recommend using our script for collecting the diagnostic information with the following command curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3

Environment Information ``` # Paste the diagnose.py command output here ----------Python Info---------- Version : 3.7.16 Compiler : GCC 11.2.0 Build : ('default', 'Jan 17 2023 22:20:44') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 22.3.1 Directory : /root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/pip ----------MXNet Info----------- Version : 1.9.1 Directory : /root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet Commit hash file "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source. Library : ['/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so'] Build features: ✔ CUDA ✔ CUDNN ✔ NCCL ✔ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✖ CPU_SSE4_1 ✖ CPU_SSE4_2 ✖ CPU_SSE4A ✖ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✖ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-4.15.0-202-generic-x86_64-with-debian-buster-sid system : Linux node : server-d5 release : 4.15.0-202-generic version : #213-Ubuntu SMP Thu Jan 5 19:19:12 UTC 2023 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz Stepping: 2 CPU MHz: 1200.184 CPU max MHz: 3500.0000 CPU min MHz: 1200.0000 BogoMIPS: 5196.97 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0-23 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/mxnet, DNS: 0.0013 sec, LOAD: 1.1103 sec. Error open Gluon Tutorial(en): http://gluon.mxnet.io, HTTP Error 404: Not Found, DNS finished in 0.0010609626770019531 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, , DNS finished in 0.0011479854583740234 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0010 sec, LOAD: 0.9683 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0011 sec, LOAD: 1.4425 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.0011096000671386719 sec. ----------Environment---------- ```
github-actions[bot] commented 1 year ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.