Open joerunde opened 6 months ago
Currently if FLASH_ATTENTION is not alos set, it will raise
Not 100% sure but I think we do actually want FLASH_ATTENTION
to be set in addition to PAGED_ATTENTION
. I can't remember why exactly..going to look into it.
@tdoublep ah, I was assuming that they were mutually exclusive, if they both need to be set then let me know if you find out why!
This is a small little change to allow llama and bigcode models to work with paged attention on a single shard. Currently if
FLASH_ATTENTION
is not alos set, it will raise