issues
search
cuda-mode
/
ring-attention
ring-attention experiments
Apache License 2.0
89
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Updating `qkv` after padding
#16
shan18
opened
5 months ago
1
Issue for ring-llama/test.ipynb
#15
fayejf
opened
5 months ago
1
The performance comparison between flash attn and ring flash attn
#14
GeneZC
opened
6 months ago
0
few more versions of sampling
#13
melvinebenezer
closed
6 months ago
3
WIP: sampling top-k top-p and greedy
#12
melvinebenezer
closed
6 months ago
1
Compare ring-flash-attention & ring-attention-pytorch
#11
andreaskoepf
opened
6 months ago
2
[info] flash attention benchmark
#10
Iron-Bound
closed
6 months ago
1
[info] test results for ring-flash-attention
#9
Iron-Bound
closed
6 months ago
0
fix the dummy-nb
#8
lancerts
closed
7 months ago
0
WIP - Converted code to pytorch
#7
Iron-Bound
closed
6 months ago
0
Bug in DummyRingAttentionImpl.ipynb (delta too high)
#6
andreaskoepf
closed
7 months ago
0
Add dummy ring attention impl notebook
#5
ericauld
closed
7 months ago
0
Extend educational naive flash-attn impl to allow partial kv-block processing (create naive ring-attn)
#4
andreaskoepf
closed
6 months ago
2
housekeeping: added notebooks dir, .vscode in gitignore
#3
melvinebenezer
closed
7 months ago
1
Analyze existing ring-attention implementations
#2
andreaskoepf
closed
6 months ago
1
Analyze overlapped P2P memory transfer and computing
#1
andreaskoepf
closed
6 months ago
2