Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.71k stars 1.26k forks source link

installing without CUDA available in advance #344

Open jamesharrisivi opened 1 year ago

jamesharrisivi commented 1 year ago

I need to use gpu nodes that do not have internet so have to prepare the environment on the login node (no gpu available, but internet).. I cannot find a way to install it..

I can install deepspeed, torch etc. without gpu being available so it is only this package!

If I know the gpus are NVIDIA A100-SXM4-40GB is there something like? TORCH_CUDA_ARCH_LIST="8.0" pip install .

this doesn't work.

torch==2.0.1 nvcc version: cuda_11.7. CUDA 12 on nvidia-smi

tridao commented 1 year ago

It technically doesn't need GPU to compile, it just needs the CUDA compiler (nvcc). So if you e.g. run docker on the login node to get an environment with nvcc, or install CUDA & nvcc on the login node, then it would compile. Ideally we'd provide binary wheels so users don't have to compile, and there's a PR on that but I've been focusing on fixing some other stuff before tackling binary wheel.

jamesharrisivi commented 1 year ago

I have nvcc as I am able to do module load CUDA/11.7

when I run TORCH_CUDA_ARCH_LIST="8.0" CUDA_HOME='/path/to/cuda_11.7_home' pip install .

i get:

No CUDA runtime is found, using CUDA_HOME='/path/to/cuda_11.7_home'

      Warning: Torch did not find available GPUs on this system.
       If your intention is to cross-compile, this is not an error.
      By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
      Volta (compute capability 7.0), Turing (compute capability 7.5),
      and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
      If you wish to cross-compile for a single specific architecture,
      export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.

      torch.__version__  = 2.0.1+cu117

      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/flash_attn
      copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-38/flash_attn
      ...

      running build_ext
      building 'flash_attn_2_cuda' extension
      creating /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38
      creating /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/csrc
      creating /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/csrc/flash_attn
      creating /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src
      Emitting ninja build file /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/33] c++ -MMD -MF /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/csrc/flash_attn/flash_api.o.d -pthread -B /path/to/python/env/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user1/path/to/flash-attention/csrc/flash_attn -I/home/user1/path/to/flash-attention/csrc/flash_attn/src -I/home/user1/path/to/flash-attention/csrc/cutlass/include -I/path/to/python/lib/python3.8/site-packages/torch/include -I/path/to/python/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/path/to/python/lib/python3.8/site-packages/torch/include/TH -I/path/to/python/lib/python3.8/site-packages/torch/include/THC -I/path/to/cuda/include -I/path/to/python/include/python3.8 -c -c /home/user1/path/to/flash-attention/csrc/flash_attn/flash_api.cpp -o /home/user1/path/to/flash-attention/build/temp.linux-x86_64-cpython-38/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
      cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      /home/user1/path/to/flash-attention/csrc/flash_attn/flash_api.cpp: In function ‘void set_params_fprop(Flash_fwd_params&, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, float, float, bool)’:
      /home/user1/path/to/flash-attention/csrc/flash_attn/flash_api.cpp:42:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct Flash_fwd_params’; use assignment or value-initialization instead [-Wclass-memaccess]
         42 |     memset(&params, 0, sizeof(params));
            |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from /home/user1/path/to/flash-attention/csrc/flash_attn/flash_api.cpp:11:
      /home/user1/path/to/flash-attention/csrc/flash_attn/src/flash.h:52:8: note: ‘struct Flash_fwd_params’ declared here

Would docker help with this even if the gpu isn't available?

tridao commented 1 year ago

That looks like it's compiling, what's the issue?