LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)
473 stars 35 forks source link

Out Of Memory #34

Closed Calendula597 closed 1 month ago

Calendula597 commented 3 months ago

Hello, I must commend your work; it's truly impressive. However, I've encountered an issue when running AgentSwin under the same batch size configurations that I successfully use with the standard Swin Transformer. Specifically, I'm experiencing out-of-memory errors with AgentSwin that do not arise with Swin Transformer. I wanted to reach out and inquire if this is expected behavior or if there might be some adjustments or optimizations I could consider to alleviate this issue?

tian-qing001 commented 2 months ago

Hi @Calendula597, thanks for your interest in our work. In cases where the sequence length $n$ is relatively small, agent attention may occupy more GPU memory compared to the Softmax attention. Maybe you can turn on --amp or use a smaller batch size.