[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
This error occurs when sampling with pretrained model:
"/xxx/VAR/models/basic_var.py", line 113, in forward
oup = flash_attn_func(q, k, v, dropout_p=dropout_p, softmax_scale=self.scale).view(B, L, C)
RuntimeError: FlashAttention only support fp16 and bf16 data type.
The problem comes from that while qkv is initially fp16, the scale_mul in line 101 of basic_var.py is fp32, which makes q and k become fp32.
update: F.normalize(q, dim=-1) also changes the dtype of q to fp32.
This error occurs when sampling with pretrained model: "/xxx/VAR/models/basic_var.py", line 113, in forward oup = flash_attn_func(q, k, v, dropout_p=dropout_p, softmax_scale=self.scale).view(B, L, C) RuntimeError: FlashAttention only support fp16 and bf16 data type.
The problem comes from that while qkv is initially fp16, the scale_mul in line 101 of basic_var.py is fp32, which makes q and k become fp32.
update: F.normalize(q, dim=-1) also changes the dtype of q to fp32.