Closed ShacklesLay closed 1 year ago
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
你好,moss 7B的head dim是128,flash-attn在3090上不支持,可以设置config.use_flash=False
config.use_flash=False
你好 FlashAttention不是根据 L1缓存的大小来划分块大小的嘛 理论上V100应该也可以支持的吧
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.