bilibili / Index-1.9B

A SOTA lightweight multilingual LLM
Apache License 2.0
675 stars 34 forks source link

can index use flash attention or xformers? #10

Open Koishi-Star opened 2 weeks ago

Koishi-Star commented 2 weeks ago

i want to know if it can use flash attention.thanks.

mayokaze commented 2 weeks ago

During our actual training process, we employed the flash attention2 mechanism; however, due to its heavy reliance on numerous other dependencies, the training codebase is not conducive to being open-sourced. For those seeking enhanced inference performance, we strongly advise adopting our newly released gguf quantization version or exploring alternative open-source options, including vllm and Tensor RT, among others.

Koishi-Star commented 2 weeks ago

Well, I will try vllm.Thanks.