Open AceCHQ opened 3 months ago
Hello, there is something wrong with flash-attn, can I drop it when I finetune DeepSeek-Math? Will it destroy the performance of the model? Thank you.
Hello, there is something wrong with flash-attn, can I drop it when I finetune DeepSeek-Math? Will it destroy the performance of the model? Thank you.