Open rubbberrabbit opened 2 years ago
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.
Description
Hello, we try to use keras as the front-end to run Mxnet, but find several Mxnet crashes, we are not should if there is a real bug trigger by those model, so we collected the Execution stack information when Mxnet crashes, most of them are related to libmxnet.so which is hard to compile in debug mode. Here is the list of the Mxnet version and Execution stack information of our models.
Further, the triggering-crash models and replay script is provided in https://drive.google.com/drive/folders/1he3I-1PKGI01t09E2FAin0_mUnmu2-oz?usp=sharing
Error Message
Some of stack informations are shown below
To Reproduce
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
To analysis the Execution stack information in libmxnet.so, we try to compile Mxnet with choice Debug=1 in config.mk but face a error report of "relocation trcuncated to fit" in several different environments. we assume that is because too much redundant code is added when compiling with DEBUG mode.
Environment
Environment Information
Mxnet 1.5.1 Keras-Mxnet 2.2.4.2 CUDA 10.1 python 3.6.12 Mxnet 1.4.1 Keras-Mxnet 2.2.4.2 CUDA 10.0 python 3.6.12 Mxnet 1.3.1 Keras-Mxnet 2.2.4.2 CUDA 9.0 python 3.6.12