elixir-nx / xla

Pre-compiled XLA extension
Apache License 2.0
87 stars 20 forks source link

Consider upgrading cuda to 11.2+ #19

Closed edubart closed 2 years ago

edubart commented 2 years ago

Please consider upgrading cuda to 11.2+, because the current build can't use the CUDA async allocator:

2022-04-07 19:06:22.321736: E tensorflow/compiler/xla/pjrt/gpu_device.cc:323] Failed to initialize CUDA async allocator: FAILED_PRECONDITION: CUDA async allocator requires CUDA >= 11.2; falling back to BFC.

This error happens when trying to use the flag xla::GpuAllocatorConfig::Kind::kCudaAsync.

I would suggest to upgrade at least to cuda 11.3 (same version pytorch offers to download in its website)

jonatanklosko commented 2 years ago

Hey @edubart, what CUDA version do you have installed? The XLA binary is prebuilt with CUDA 11.1, but it should be compatible with any 11.1+, so if you have 11.2 or 11.3 installed, you can still us the cuda111 binary. Is that the case and you still get the error?

jonatanklosko commented 2 years ago

Ah, looking at the source this is a compile-time check here, so we would need to precompile with CUDA 11.2 to enable this.

How do you use the flag, are you using XLA directly or is there some env var that you set when running with EXLA?

edubart commented 2 years ago

Hey @edubart, what CUDA version do you have installed?

I have CUDA 11.6 installed in my system.

How do you use the flag, are you using XLA directly or is there some env var that you set when running with EXLA?

I am using XLA directly just with C++ code, I've opted to use this project prebuilt binaries because it offers the XLA PjRtClient API (while standard prebuilt TensorFlow doesn't). I have also tried to build this project with newer CUDA, but I've failed on that (probably I've missed something in the instructions or maybe the scripts are not compatible with newer cuda).

jonatanklosko commented 2 years ago

Ah I see! We use this package primarily for Elixir EXLA, which determines what features and setups we want to support. You should definitely be able to compile for newer version, with either:

  1. Having a local setup with CUDA, Bazel and Python, as noted in the README, then running mix deps.get and XLA_TARGET=cuda XLA_BUILD=true mix compile.

  2. Inside Docker, as described here. I believe you just need to change the image slightly, to use a base image with newer CUDA and change XLA_TARGET=cuda111 to XLA_TARGET=cuda.

jonatanklosko commented 2 years ago

We need to precompile with CUDA 11.1 to support more environments, and we currently don't need separate precompilation for newer version (because the 11.1 build is forward compatible). Building for CUDA usually times out on CI, so for the last release I used the Docker setup to build it locally and added it to the release. Consequently a separate compilation with CUDA 11.2 is a maintenance burden we want to avoid, until we need it. Therefore, please try building locally, especially in Docker, and if you run into any specific issues let me know :)