Segmentation fault (core dumped) and RuntimeError: cuda runtime error (98) : unrecognized error code at apex/contrib/csrc/optimizers/fused_adam_cuda_kernel.cu:226

THCudaCheck FAIL file=apex/contrib/csrc/optimizers/fused_adam_cuda_kernel.cu line=226 error=98 : unrecognized error code

Traceback (most recent call last):
  File "finetune.py", line 261, in <module>
    optimizer.step()
  File "/home/ai/anaconda3/lib/python3.6/site-packages/apex/contrib/optimizers/fp16_optimizer.py", line 154, in step
    grad_norms=norm_groups)
  File "/home/ai/anaconda3/lib/python3.6/site-packages/apex/contrib/optimizers/fused_adam.py", line 180, in step
    group['weight_decay'])
RuntimeError: cuda runtime error (98) : unrecognized error code at apex/contrib/csrc/optimizers/fused_adam_cuda_kernel.cu:226

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

Segmentation fault (core dumped)

I have to use the completely FP16 training for one model with multi-gpu, but amp failed to use with DataParal in mode 'O2' or 'O3'. And I have no choice but to use FP16_Optimizer in apex.contrib. When I use FusedLayerNorm, it calls the segmentation fault too.

environment: cuda 10.0 pytorch 1.2.0 gcc 5.4.0

The apex is installed with

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"  --global-option="--deprecated_fused_adam" ./

NVIDIA / apex

Segmentation fault (core dumped) and RuntimeError: cuda runtime error (98) : unrecognized error code at apex/contrib/csrc/optimizers/fused_adam_cuda_kernel.cu:226 #537