Closed ivelet closed 5 months ago
Fixed attention implementation to get the same performance with flash-attn=False/True.
Automatically checks for flash-attn package and also device to be able to run inference and evaluation without a GPU or flash attention installed.
Fixed attention implementation to get the same performance with flash-attn=False/True.
Automatically checks for flash-attn package and also device to be able to run inference and evaluation without a GPU or flash attention installed.