Open yangkky opened 5 years ago
Recent changes to Pytorch's built-in extension builder sometimes lead to it compiling for a different architecture. Try explicitly setting the list of compute capabilities you want to target by saying e.g.
$ export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5"
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
6.0...7.5 may be trimmed down to only the compute capabilities you know you want to target. For example, if you will only run on Voltas, export TORCH_CUDA_ARCH_LIST="7.0"
Also, we are in the process of evaluating Pytorch's native layer norm and upstreaming Apex's implementation if necessary, so for future-proofing, I recommend just using the native Pytorch layernorm.
Explicitly setting the architectures seems to fix it.
Out of curiosity, is there a place that lists what each of those architectures is?
@yangkky and for anyone from the future, the CUDA Wikipedia page has a good feature table that can help you figure out how to pin TORCH_CUDA_ARCH_LIST
.
After a GPU tensor goes through FusedLayerNorm, the next time that memory is accessed I get a
RuntimeError: CUDA error: no kernel image is available for execution on the device
.To reproduce:
Other operations on
attended
orx
will also raise the error. However, if I movex
to the CPU, I can then proceed to use it without any problems.I'm running this on an AWS p3.2xlarge instance based on the AWS Deep Learning AMI (Ubuntu 18.04) Version 25.0. We've updated pytorch to 1.3.0, and installed GPUtil, Apex, and gpustat using the following commands:
Doing the same thing on an aws p2.xlarge instance with the same changes to the environment does not cause the error.