Open Vezora-Corp opened 1 year ago
I have the same error.
Final error is
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/functional.hpp(104): error: no instance of overloaded function "cute::abs" matches the argument list
argument types are: (const cute::_1)
In a long string of template instantiations.
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/functional.hpp(104): error: no instance of overloaded function "cute::abs" matches the argument list
argument types are: (const cute::_1)
detected during:
instantiation of "decltype(auto) cute::abs_fn::operator()(T &&) const [with T=const cute::_1 &]"
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/tuple_algorithms.hpp(283): here
instantiation of "auto cute::transform_leaf(const T &, F &&) [with T=cute::_1, F=cute::abs_fn &]"
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/tuple_algorithms.hpp(281): here
instantiation of function "lambda [](const auto &)->auto [with <auto-1>=cute::_1]"
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/tuple_algorithms.hpp(114): here
instantiation of "auto cute::detail::tapply(T &&, F &&, G &&, cute::seq<I...>) [with T=const cute::tuple<cute::C<1>, int> &, F=lambda [](const auto &)->auto &, G=lambda [](const auto &...)->auto, I=<0, 1>]"
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/tuple_algorithms.hpp(236): here
instantiation of "auto cute::transform(const T &, F &&) [with T=cute::tuple<cute::C<1>, int>, F=lambda [](const auto &)->auto]"
C:/MachineLearning/MiniSDXL/flash-attention/csrc/cutlass/include\cute/algorithm/tuple_algorithms.hpp(281): here
[ 6 instantiation contexts not shown ]
instantiation of "auto cute::tile_to_shape(const cute::ComposedLayout<A, O, B> &, const Shape &, const ModeOrder &) [with A=cute::Swizzle<2, 3, 3>, O=cute::C<0>, B=cute::Layout<cute::tuple<cute::C<8>, cute::C<32>>, cute::tuple<cute::_32, cute::_1>>, Shape=cute::tuple<cute::C<64>, cute::C<96>>, ModeOrder=cute::GenColMajor]"
C:\MachineLearning\MiniSDXL\flash-attention\csrc\flash_attn\src\kernel_traits.h(239): here
instantiation of class "Flash_bwd_kernel_traits<kHeadDim_, kBlockM_, kBlockN_, kNWarps_, AtomLayoutMSdP_, AtomLayoutNdKV, AtomLayoutMdQ, Is_V_in_regs_, No_double_buffer_, elem_type, Base> [with kHeadDim_=96, kBlockM_=64, kBlockN_=128, kNWarps_=8, AtomLayoutMSdP_=2, AtomLayoutNdKV=4, AtomLayoutMdQ=4, Is_V_in_regs_=false, No_double_buffer_=false, elem_type=cutlass::half_t, Base=Flash_kernel_traits<96, 64, 128, 8, cutlass::half_t>]"
C:\MachineLearning\MiniSDXL\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(49): here
instantiation of "void run_flash_bwd_seqk_parallel<Kernel_traits,Is_dropout>(Flash_bwd_params &, cudaStream_t, __nv_bool) [with Kernel_traits=Flash_bwd_kernel_traits<96, 64, 128, 8, 2, 4, 4, false, false, cutlass::half_t, Flash_kernel_traits<96, 64, 128, 8, cutlass::half_t>>, Is_dropout=true]"
C:\MachineLearning\MiniSDXL\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(135): here
instantiation of "void run_flash_bwd<Kernel_traits,Is_dropout>(Flash_bwd_params &, cudaStream_t, __nv_bool) [with Kernel_traits=Flash_bwd_kernel_traits<96, 64, 128, 8, 2, 4, 4, false, false, cutlass::half_t, Flash_kernel_traits<96, 64, 128, 8, cutlass::half_t>>, Is_dropout=true]"
C:\MachineLearning\MiniSDXL\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(211): here
instantiation of "void run_mha_bwd_hdim96<T>(Flash_bwd_params &, cudaStream_t, __nv_bool) [with T=cutlass::half_t]"
C:\MachineLearning\MiniSDXL\flash-attention\csrc\flash_attn\src\flash_bwd_hdim96_fp16_sm80.cu(9): here
This then lead to error types being generated and lots of other arithmetic template derivations failing.
I got the same error. Any one knows how to fix it?
Trying to run: pip install flash-attn --no-build-isolation System build Build cuda_11.8.r11.8/compiler.31833905_0 Windows 11 3090 Python 3.11.4 Pytorch 2.0.1+cu117
Installing a build without flash attention 2 does work EG pip install flash-attn<2, i tried "pip install flash-attn===1.0.4 --no-build-isolation" with success. I pasted as much of the error code that wasn't cut off due legth in the txt file ErrorMessage.txt