issues
search
feifeibear
/
long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Apache License 2.0
307
stars
18
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
remove useless workflow
#79
feifeibear
closed
1 day ago
0
version 0.3.2
#78
feifeibear
closed
2 days ago
0
auto publish python package when release on github
#77
feifeibear
closed
2 days ago
0
remove amd installation to an individual doc
#76
feifeibear
closed
2 days ago
0
0914v2
#75
feifeibear
closed
2 days ago
0
use extract_local for test_hybrid_attn.py
#74
feifeibear
closed
2 days ago
0
在并行度相同的情况下,USP和ulysess、ring-sp性能相比如何呢?
#73
hb-jw
closed
1 week ago
1
improve readability and potential numerical stability of ring attention
#72
feifeibear
closed
2 weeks ago
0
[AMD GPU] Add amd gpu suppport
#71
yiakwy-xpu-ml-framework-team
closed
1 week ago
3
feat: add support for flash_attn>=2.6.0
#70
Eigensystem
closed
2 weeks ago
0
add license
#69
feifeibear
closed
3 weeks ago
0
What does mean the figure 5?
#68
takfate
closed
3 weeks ago
3
LICENSE problem
#67
pustar
closed
3 weeks ago
1
GPU Memory Usage
#66
guanzhchen
closed
3 weeks ago
1
Is there example of how to use the hybrid-sp in Megatron-LM?
#65
xs1997zju
closed
3 weeks ago
1
flash_attn version dependency
#64
Eigensystem
closed
2 weeks ago
1
support multiple node
#63
feifeibear
closed
2 months ago
0
add reference to this project
#62
feifeibear
closed
2 months ago
0
release V0.2
#61
feifeibear
closed
2 months ago
0
add tesla supports for ulysses
#60
feifeibear
closed
2 months ago
0
ulysses do not use flash_attn on T4 GPU
#59
feifeibear
closed
2 months ago
0
add loss curve
#58
feifeibear
closed
3 months ago
0
Fix r_rank/u_rank in extract local to be compatible with DP
#57
ShomyLiu
closed
3 months ago
0
显存占用问题
#56
realgump
closed
3 months ago
1
update readme
#55
feifeibear
closed
4 months ago
0
update readme
#54
feifeibear
closed
4 months ago
0
Hybrid中zigzag ring attention 的数据分片问题
#53
YouYouCoding
closed
4 months ago
3
add patches
#52
feifeibear
closed
4 months ago
0
add megatron-deepspeed patch
#51
feifeibear
closed
4 months ago
0
set_seq_parallel_pg compatible with DP process group
#50
feifeibear
closed
4 months ago
0
hotfix import error
#49
feifeibear
closed
5 months ago
0
update comprehensive benchmark results
#48
feifeibear
closed
5 months ago
0
fix qkvpacked strip and zigzag test error
#47
feifeibear
closed
5 months ago
0
convert input tensor layout inside stripe and zigzag ring attention
#46
feifeibear
closed
5 months ago
0
remove useless code
#45
feifeibear
closed
5 months ago
0
async hybrid attention forward only
#44
feifeibear
closed
5 months ago
0
initial async ulysses attn
#43
feifeibear
closed
5 months ago
0
move all_to_all to ./comm
#42
feifeibear
closed
5 months ago
0
add torch profiler
#41
feifeibear
closed
5 months ago
0
Comparing Ulysses and Ring with torch profiler
#40
feifeibear
closed
3 months ago
4
请教下,混合使用这两种方案会有哪些优势呢?技术出发点有介绍吗?
#39
nullnonenilNULL
closed
4 months ago
1
add gqa support and benchmark results in readme
#38
feifeibear
closed
5 months ago
0
关于数据分割和合并
#37
Kwen-Chen
closed
4 months ago
12
warmup longctx no pack benchmark
#36
feifeibear
closed
5 months ago
0
use global process group
#35
feifeibear
closed
5 months ago
0
0409v3
#34
feifeibear
closed
5 months ago
0
process group as global vars
#33
feifeibear
closed
5 months ago
0
reorganize the directories
#32
feifeibear
closed
5 months ago
0
fix ulysses bugs
#31
feifeibear
closed
5 months ago
0
update benchmark scripts
#30
feifeibear
closed
5 months ago
0
Next