Open michael-heinrich opened 1 month ago
Choose LlamaAttention instead of LlamaFlashAttention2, if flash attention is not supported by the GPU architecture. PR for #41
Choose LlamaAttention instead of LlamaFlashAttention2, if flash attention is not supported by the GPU architecture. PR for #41