cuda-mode ring-attention issues - Githubissues

cuda-mode / ring-attention

ring-attention experiments

Apache License 2.0

89 stars 10 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Updating `qkv` after padding

#16 shan18 opened 5 months ago
1
Issue for ring-llama/test.ipynb

#15 fayejf opened 5 months ago
1
The performance comparison between flash attn and ring flash attn

#14 GeneZC opened 6 months ago
0
few more versions of sampling

#13 melvinebenezer closed 6 months ago
3
WIP: sampling top-k top-p and greedy

#12 melvinebenezer closed 6 months ago
1
Compare ring-flash-attention & ring-attention-pytorch

#11 andreaskoepf opened 6 months ago
2
[info] flash attention benchmark

#10 Iron-Bound closed 6 months ago
1
[info] test results for ring-flash-attention

#9 Iron-Bound closed 6 months ago
0
fix the dummy-nb

#8 lancerts closed 7 months ago
0
WIP - Converted code to pytorch

#7 Iron-Bound closed 6 months ago
0
Bug in DummyRingAttentionImpl.ipynb (delta too high)

#6 andreaskoepf closed 7 months ago
0
Add dummy ring attention impl notebook

#5 ericauld closed 7 months ago
0
Extend educational naive flash-attn impl to allow partial kv-block processing (create naive ring-attn)

#4 andreaskoepf closed 6 months ago
2
housekeeping: added notebooks dir, .vscode in gitignore

#3 melvinebenezer closed 7 months ago
1
Analyze existing ring-attention implementations

#2 andreaskoepf closed 6 months ago
1
Analyze overlapped P2P memory transfer and computing

#1 andreaskoepf closed 6 months ago
2