flash-atten相关问题

FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

MIT License

3.78k stars 285 forks source link

你好，有个flash-atten的问题想请教下，当我想使能flash-attn时，我发现以下图1的逻辑根本走不进去，为此我打印了self.using_flash、attn_bias、qkv.dtype，最后发现attn_bias一直不是None（图2）图1：图2：

于是我将代码修改成以下逻辑： using_flash = self.using_flash and attn_bias is None and qkv.dtype != torch.float32 修改为 using_flash = self.using_flash and qkv.dtype != torch.float32

assert attn_bias is None and qkv.dtype != torch.float32 修改为 assert qkv.dtype != torch.float32 但最后报了如图3的错误图3：

于是我继续打印输入的q、k、v的dtype（如图4）图4：最后在代码中添加以下逻辑后功能才OK 请问这是已知bug吗，麻烦请检查下呢，或者是我哪里操作不对吗，请指导下，最后是我的运行命令 torchrun --nproc_per_node=8 --nnodes=8 --node_rank=1 train.py --depth=16 --bs=384 --ep=200 --fp16=1 --alng=1e-3 --wpe=0.1 --afuse=False

FoundationVision / VAR

flash-atten相关问题 #27