Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.47k stars 1.23k forks source link

Errors during compiling from the source #172

Open wuliJerry opened 1 year ago

wuliJerry commented 1 year ago

Hi there, I am trying to compile the flash-attention from the source using python setup.py install.
However, I encountered these error messages before the compilation failed:

FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_fwd_hdim128.o    
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_fwd_hdim32.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_bwd_hdim128.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_fwd_hdim64.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_bwd_hdim32.o 
FAILED: /mnt/disk/flash-attention/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_bwd_hdim64.o 

Before these errors, there were some warning messages like:

/mnt/disk/anaconda3/envs/flashattn/lib/python3.9/site-packages/torch/include/c10/util/reverse_iterator.h:64:38: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
   64 | class reverse_iterator
      |                                      ^       
/usr/include/c++/12.2.1/bits/stl_iterator_base_types.h:127:27: note: declared here
  127 |     struct _GLIBCXX17_DEPRECATED iterator
      |                           ^~~~~~~~
/mnt/disk/anaconda3/envs/flashattn/lib/python3.9/site-packages/torch/include/c10/util/irange.h:19:39: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
   19 | struct integer_iterator : std::iterator<std::input_iterator_tag, I> {
      |                                       ^~~~~~~~
/usr/include/c++/12.2.1/bits/stl_iterator_base_types.h:127:27: note: declared here
  127 |     struct _GLIBCXX17_DEPRECATED iterator
      |                           ^~~~~~~~
/usr/include/c++/12.2.1/bits/locale_facets_nonio.tcc: In member function ‘_InIter std::time_get<_CharT, _InIter>::get(iter_type, iter_type, std::ios_base&, std::ios_base::iostate&, tm*, const char_type*, const char_type*) const’:
/usr/include/c++/12.2.1/bits/locale_facets_nonio.tcc:1477:77: error: invalid type argument of unary ‘*’ (have ‘int’)
 1477 |       if ((void*)(this->*(&time_get::do_get)) == (void*)(&time_get::do_get))
      |                                                                             ^   
/usr/include/c++/12.2.1/bits/stl_map.h: In member function ‘std::pair<typename std::_Rb_tree<_Key, std::pair<const _Key, _Val>, std::_Select1st<std::pair<const _Key, _Val> >, _Compare, typename __gnu_cxx::__alloc_traits<_Allocator>::rebind<std::pair<const _Key, _Val> >::other>::iterator, bool> std::map<_Key, _Tp, _Compare, _Alloc>::emplace(_Args&& ...)’:
/usr/include/c++/12.2.1/bits/stl_map.h:593:29: error: parameter packs not expanded with ‘...’:
  593 |                 if constexpr (__usable_key<decltype(__a)>)
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                             
/usr/include/c++/12.2.1/bits/stl_map.h:593:29: note:         ‘_Args’

The CUDA version I used is 11.7, I have both tried on PyTorch 1.13.1 and 2.0.0 the results were the same. It might be worth to mention is that compiling the FlashAttention has the following requirement:

RuntimeError: The current installed version of /usr/bin/g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of /usr/bin/g++ (>=6.0.0, <12.0).

So I used export CXX=g++-11 to set the g++-11 as the compiler ( g++11.3), yet in the log it seems like the g++ 12.2.1's head files are used. That might be the problem.

The full log file is attached.

log.txt

wuliJerry commented 1 year ago

Also, I was trying to manually add the path of g++-11's head files in the setup.py by:

include_dirs=[
    Path(this_dir) / 'csrc' / 'flash_attn',
    Path(this_dir) / 'csrc' / 'flash_attn' / 'src',
    Path(this_dir) / 'csrc' / 'flash_attn' / 'cutlass' / 'include',
    '/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++',  # Add this line
],

It outputed something like this:

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(66): error: the global scope has no "fegetexceptflag"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(67): error: the global scope has no "feraiseexcept"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(68): error: the global scope has no "fesetexceptflag"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(69): error: the global scope has no "fetestexcept"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(71): error: the global scope has no "fegetround"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(72): error: the global scope has no "fesetround"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(74): error: the global scope has no "fegetenv"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(75): error: the global scope has no "feholdexcept"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(76): error: the global scope has no "fesetenv"

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/include/c++/cfenv(77): error: the global scope has no "feupdateenv"

Is there any missing file in my machine that caused that?

tridao commented 1 year ago

Thanks for the report. I didn't know about this, but seems like CUDA only supports gcc up to some max version. Can you try following this SO answer to symlink gcc and g++ to /usr/local/cuda/bin?

wuliJerry commented 1 year ago

Thanks for the report. I didn't know about this, but seems like CUDA only supports gcc up to some max version. Can you try following this SO answer to symlink gcc and g++ to /usr/local/cuda/bin?

Thanks for your reply. I checked that in the /path/to/cuda/bin, the gcc and g++ is symlinked to the appropriate version. However, the problem is addressed now by switching to clang and recompiling the PyTorch with it. Maybe it was caused by the missing file in my gcc environment. Thank you again for your advice! I will close this issue.

mdegans commented 1 year ago

Can this be reopened? "Just use clang" is nice and all, but it still won't build with gcc-11. Clang does not work for me. Even with CC=/usr/bin/clang-11 CXX=/usr/bin/clang++-11 the build still fails. Perhaps the compiler version detection code in torch is broken?

The current installed version of clang++-11 (0.0.0) is less than the minimum required version by CUDA 11.5 (6.0.0). Please make sure to use an adequate version of clang++-11 (>=6.0.0, <=12.0.0).

It doesn't seem the version string is properly parsed. Could be because:

$ clangd-11 --version
Ubuntu clangd version 11.1.0-6build1

It would be nice if compiler developers decided to return a simple semver for --version?

tridao commented 1 year ago

I don't have experience building with Clang. I've been using the nvidia pytorch docker image (which comes with gcc 9), and that works fine.

I believe gcc 11 should work, but I don't have an environment to test that.

wuliJerry commented 1 year ago

Yes, I built PyTorch successfully on a server using GCC 11. GCC 11 should work.