Applying agent attention to auto-regressive models

LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)

473 stars 35 forks source link

Applying agent attention to auto-regressive models #8

Closed yaoxingcheng closed 8 months ago

yaoxingcheng commented 8 months ago

Thank you for your inspiring work. I'm eager to apply ideas like agent attention to train prevailing auto-regressive models like GPT. However, using pooling on Q to get the A matrix will cause information leakage when training auto-regressive models using teacher forcing. I haven't found related discussions in your paper. Is there any straightforward extension or variation of agent attention to adapt it to auto-regressive models?

tian-qing001 commented 8 months ago

Hi @yaoxingcheng, thanks for your interest in our work. As a form of generalized linear attention, auto-regressive models employing agent attention can be trained following the methodology established for linear attention. For a comprehensive understanding of this training approach, I recommend referencing Paper 1 and Paper 2.

tian-qing001 commented 8 months ago

We believe our agent attention has great potential to handel extremely long sequences in large language models and multi-modal models.

yaoxingcheng commented 8 months ago

Thank you so much! The reference really helps. Can't wait to see how agent attention combined with methods like LAVO can boost up long-context LLMs.