feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Apache License 2.0
350 stars 24 forks source link

GPU Memory Usage #66

Closed guanzhchen closed 2 months ago

guanzhchen commented 3 months ago

Hi, Thanks for your awesome work. In my test on 8xA800, why using USP with ulysses_degree=8 and ring_degree=1 would take more GPU memory than naive Ulysses?

feifeibear commented 3 months ago

All2All needs some tmp buffer for async P2P. could you post the memory difference? It is very small according to my experience.