VITA-Group / LLaGA

[ICML2024] "LLaGA: Large Language and Graph Assistant", Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang
Apache License 2.0
60 stars 2 forks source link

head dim >64 #16

Open JlexZhong opened 2 weeks ago

JlexZhong commented 2 weeks ago

RuntimeError: FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. Are it referring to the head dimension of vicuna-7b being more than 64?

JlexZhong commented 2 weeks ago

My environment = A40*8

ChenRunjin commented 2 weeks ago

I used A6000 for training, but I didn't have this issue, which flash-attention version are you using?