Open Koishi-Star opened 2 weeks ago
During our actual training process, we employed the flash attention2 mechanism; however, due to its heavy reliance on numerous other dependencies, the training codebase is not conducive to being open-sourced. For those seeking enhanced inference performance, we strongly advise adopting our newly released gguf quantization version or exploring alternative open-source options, including vllm and Tensor RT, among others.
Well, I will try vllm.Thanks.
i want to know if it can use flash attention.thanks.