Open JlexZhong opened 2 months ago
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. Are it referring to the head dimension of vicuna-7b being more than 64?
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
My environment = A40*8
I used A6000 for training, but I didn't have this issue, which flash-attention version are you using?
RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory.
Are it referring to the head dimension of vicuna-7b being more than 64?