Open jorgeantonio21 opened 3 weeks ago
In order to further guide the work on this issue, we suggest the following:
Hey @jorgeantonio21! Shall I take up this issue? I will follow every step as shown above
@RajeshRk18 please feel free to tackle this one down. Also, if any questions/issues arise while tackling this one feel free to ping me
hello, I would love to work on this too
Hi @jorgeantonio21. I will love to work on this.
Pls assign if available
Hi @jorgeantonio21 ... I'd like to work on this
Following the paged attention paper, add cuda kernels for the Llama model. Cuda kernels for the Llama architecture have been widely implemented in the open source community.
Moreover, paged attention kernels can be found in the original vLLM implementation here. Of special interest for us is the cuda implementation in attention_kernels.cu, which implements the attention logic using the pagination algorithms (described in the reference paper above).
NOTE: Issue 1 must be completed in order to progress onto issues 2,3.4