VITA-Group / LLaGA

[ICML2024] "LLaGA: Large Language and Graph Assistant", Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang
Apache License 2.0
81 stars 3 forks source link

head dim >64 #16

Open JlexZhong opened 2 months ago

JlexZhong commented 2 months ago

RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. Are it referring to the head dimension of vicuna-7b being more than 64?

JlexZhong commented 2 months ago

My environment = A40*8

ChenRunjin commented 2 months ago

I used A6000 for training, but I didn't have this issue, which flash-attention version are you using?