First and foremost, thank you for the commendation on our work and paper. I've been attempting to run Evo locally on T4 GPUs, but I encountered an issue with FlashAttn 2.0 not being supported yet. I have a few questions regarding this:
Do you have any plans to support T4 GPUs in the near future?
Will a single 16GB T4 GPU be sufficient for inference? If not, can we implement some optimization processes (with deepspeed) for Hugging Face models?
Is there a way to use FlashAttn 1.x versions, or can we disable Flash-Attn usage directly?
Is it possible to use float16 rather than bfloat16?
Hello,
First and foremost, thank you for the commendation on our work and paper. I've been attempting to run Evo locally on T4 GPUs, but I encountered an issue with FlashAttn 2.0 not being supported yet. I have a few questions regarding this:
Do you have any plans to support T4 GPUs in the near future? Will a single 16GB T4 GPU be sufficient for inference? If not, can we implement some optimization processes (with deepspeed) for Hugging Face models? Is there a way to use FlashAttn 1.x versions, or can we disable Flash-Attn usage directly? Is it possible to use float16 rather than bfloat16?
Thank you,