atoma-network / atoma-paged-attention

Paged attention cuda kernels for the Atoma protocol
1 stars 5 forks source link

Add paged attention kernels for the Llama model architecture #1

Open jorgeantonio21 opened 3 weeks ago

jorgeantonio21 commented 3 weeks ago

Following the paged attention paper, add cuda kernels for the Llama model. Cuda kernels for the Llama architecture have been widely implemented in the open source community.

Moreover, paged attention kernels can be found in the original vLLM implementation here. Of special interest for us is the cuda implementation in attention_kernels.cu, which implements the attention logic using the pagination algorithms (described in the reference paper above).

NOTE: Issue 1 must be completed in order to progress onto issues 2,3.4

jorgeantonio21 commented 3 weeks ago

In order to further guide the work on this issue, we suggest the following:

RajeshRk18 commented 2 weeks ago

Hey @jorgeantonio21! Shall I take up this issue? I will follow every step as shown above

jorgeantonio21 commented 2 weeks ago

@RajeshRk18 please feel free to tackle this one down. Also, if any questions/issues arise while tackling this one feel free to ping me

g4titanx commented 2 weeks ago

hello, I would love to work on this too

fishonamos commented 2 weeks ago

Hi @jorgeantonio21. I will love to work on this.

Josh-121 commented 2 weeks ago

Pls assign if available

No-bodyq commented 2 weeks ago

Hi @jorgeantonio21 ... I'd like to work on this