Open rayjs opened 5 years ago
@rayjs thanks for reporting this issue and glad to see you have a workaround for it. Looks like this one is a bug in the operator. But just in case, could you please make sure that you are using this operator correctly by referring to this documentation. Some of the args should be set or they go with default values.
@mxnet-label-bot please add [python, operator]
@lanking520 Forrrelu
the default parameters work so I am certain about the correct usage. leaky
act_type
works from what I have found but rrelu
causes the training to crash. I have not checked the other act_type
options in LeakyReLU
other than these two
@mxnet-label-bot please add [bug]
@mxnet-label-bot [bug]
Same as #14447 @mxnet-label-bot add [bug]
Description
Training SSD networks with LeakyReLU (rrelu) activation causes the training to crash. I have tried different networks and vgg16_reduced.py as well. It always crashes
Environment info (Required)
Package used (Python/R/Scala/Julia): Python
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio): gcc
MXNet commit hash: 74638105f5480349cf57cda40a37475d626dbf41
Build config: make -j4 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1
Error Message:
Minimum reproducible example
In vgg16_reduced.py in example/ssd/symbol, make the following changes as shown below.
relu1_1 = mx.symbol.LeakyReLU(data=conv1_1, act_type="rrelu", name="relu1_1")
Replacing LeakyReLU with activations at other positions also causes the training to crashSteps to reproduce
python train.py --gpus 0,1 --batch-size 32 --pretrained ''
What have you tried to solve it?
I have had to replace LeakyReLU(rrelu) with other activations to get around this issue.