dingo-actual / infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
MIT License
280 stars 23 forks source link

q, k, v projection inside the loop #5

Closed usryokousha closed 6 months ago

usryokousha commented 6 months ago

Isn’t is unnecessary to perform q, k, v projections inside the loop? Due to data copying during the calculation process this makes the whole attention operation slower. Ideally this whole thing would be done with a fused kernel … future personal project of mine.

dingo-actual commented 6 months ago

Good catch! I'll make the change.

dingo-actual commented 6 months ago

Implemented.