apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[MxNet 1.6 CPP+cu101+MKL] Getting StackOverflow when running on Windows IIS #20143

Open jpsalada opened 3 years ago

jpsalada commented 3 years ago

Description

Getting stack overflow in the file cudnn_cnn_infer64_8.dll when running inference on Windows 10+IIS with MXNET 6.0+MKL+CUDA 10.1 cpp package. This error will NOT occur if I run the same code and input in a console application. MXNET was compiled locally (CMakeCache.txt here). I am trying to run RetinaFace Res50 model.

Error Message

Unhandled exception at 0x00007FFD8F563D88 (cudnn_cnn_infer64_8.dll) in w3wp.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000002C7BFC3000).

To Reproduce

Always craches on the last instruction of this code example:

//load model and configure ctx with gpu

mxnet::cpp::NDArray data(mxnet::cpp::Shape(batch_size, num_channels, height, width), ctx, false);
data.SyncCopyFromCPU(img_data, batch_size * num_channels * height * width);
data.WaitToRead();  

args["data"] = data;

Executor *exec = sym_net.SimpleBind(*Ctx, args, map<string, NDArray>(), map<string, OpReqType>(), aux);
exec->Forward(false);

vector<uint> cls_shape = exec->outputs[idx].GetShape();
uint sz = 1;
int i = 0;
while (i < cls_shape.size()) {
    sz *= cls_shape[i++];
}
vector<float> cls_data(sz);
    //will crash in the next instruction
exec->outputs[idx].SyncCopyToCPU((float*)&(cls_data[0]), static_cast<mx_uint>( sz));

``

Steps to reproduce

  1. I have created a c++ dll which contains the above code.
  2. The dll is called from a C# WebAPI deployed on Windows IIS.
  3. The application will always crash in the same instruction as exemplified previously.

What have you tried to solve it?

  1. Since when I run the same code pipeline using a console application it does not crash, I checked with ProcessExplorer that both are using the same CUDA files
  2. I also tried to replace my code with used in the example image_classification_predict.cc, and the same behavior occured when getting the output from the model
  3. Make sure the image input arriving is okey and also there are not errors when loading the model.

Environment

Environment Information ----------System Info---------- Platform : Windows-10-10.0.19041-SP0 system : Windows node : RM-UK-DT-0165 release : 10 version : 10.0.19041 ----------Hardware Info---------- machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel Name Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Microsoft Remote Display Adapter NVIDIA GeForce RTX 2080
github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

alexcheg commented 3 years ago

Has there been any resolution/follow up on this issue? I am having a similar problem that I believe to have the same cause - C++ DLL calling Conv1D.Forward method fails with stack overflow. The same code works normally as console application on GPU or as DLL on CPU (no CUDA). I think the problem only occurs for the combination of CUDA and DLL

jpsalada commented 3 years ago

Hi, I started using OpenBlas with Cuda 10.2, and I was able to get it working.