Open Jacck opened 5 months ago
2080 (Turing) is not supported in the latest version.
我在跑mini-internvl-4b预训练模型的时候也遇到了这样的问题:I get warning: You are not running the flash-attention implementation, expect numerical differences. A100服务器。torch version: 2.1.0a0+4136153,flash-attn version: 2.3.6,transformers version: 4.41.2
I get warning: You are not running the flash-attention implementation, expect numerical differences. I just run basic inference using model Microsoft Phi-3-mini-128k-instruct with cuda. I have Nvidia GeForce RTX 2080, Driver Version: 546.12, CUDA Version: 12.3. Bitsandbytes Version: 0.43.1. In addition, I get warning: Current
flash-attenton
does not supportwindow_size
. Either upgrade or useattn_implementation='eager'
How to resolve it, Thx.