issues
search
Zyphra
/
Megatron-LM
Ongoing research training transformer models at scale
Other
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Merge with upstream
#48
Quentin-Anthony
opened
9 months ago
0
Merge latest main into pyramid moe
#47
Quentin-Anthony
closed
11 months ago
0
Print token count during dataset load
#46
yury-tokpanov
closed
11 months ago
0
Update time profiling to latest main
#45
Quentin-Anthony
closed
11 months ago
0
Update eval_train to latest
#44
Quentin-Anthony
closed
11 months ago
0
Varying experts per layer
#43
pglorio
opened
11 months ago
0
Improved sinkhorn algorithm
#42
pglorio
closed
11 months ago
0
Top1 softmaxt
#41
pglorio
closed
11 months ago
0
Time and memory profiling
#40
pglorio
opened
11 months ago
0
Marging MLP expansion factor branch
#39
BerenMillidge
closed
11 months ago
0
Incorporate evaluation harness into training loop
#38
yury-tokpanov
closed
11 months ago
0
[BUG] CUBLAS Error with small number of experts
#37
Quentin-Anthony
opened
11 months ago
0
Slurm
#36
pglorio
closed
11 months ago
0
Pulled from slurm branch
#35
pglorio
closed
11 months ago
0
Merge latest main into slurm
#34
Quentin-Anthony
closed
11 months ago
0
Balancing loss2
#33
pglorio
closed
11 months ago
0
Remove wandb test log from earlier
#32
Quentin-Anthony
closed
11 months ago
0
Text generation
#31
BerenMillidge
opened
11 months ago
0
Evaluation
#30
BerenMillidge
closed
11 months ago
0
Added possibility of dropping --router-profiling-path
#29
pglorio
closed
11 months ago
0
Profiling through NVIDIA Nsight Systems
#28
pglorio
closed
11 months ago
0
Update balancing_loss2
#27
pglorio
closed
11 months ago
0
update slurm branch with latest main
#26
Quentin-Anthony
closed
11 months ago
0
Routing profiling
#25
pglorio
closed
11 months ago
0
[ENHANCEMENT] Test mup for moe
#24
Quentin-Anthony
opened
11 months ago
0
[ENHANCEMENT] Investigate fp8 Training
#23
Quentin-Anthony
opened
11 months ago
0
Fix typo in slurm script
#22
yury-tokpanov
closed
11 months ago
0
Routing profiling
#21
pglorio
closed
11 months ago
0
[ENHANCEMENT] Introduce Expert Interval
#20
Quentin-Anthony
opened
11 months ago
1
Slurm scripts
#19
yury-tokpanov
closed
11 months ago
0
[ENHANCEMENT] Introduce routing profiling
#18
Quentin-Anthony
closed
11 months ago
0
[ENHANCEMENT] Replace periodic validation with eval harness calls
#17
Quentin-Anthony
opened
11 months ago
0
Topk routing (without balancing)
#16
pglorio
closed
11 months ago
3
Merge into main the inf_lr_sched2 branch
#15
pglorio
closed
11 months ago
2
[ENHANCEMENT] Port new fused rotary embedding kernel into MLM
#14
Quentin-Anthony
opened
12 months ago
0
[QUESTION] Look into loading multiple datasets
#13
Quentin-Anthony
opened
12 months ago
0
[BUG] Fix the infinite LR schedule
#12
Quentin-Anthony
closed
11 months ago
1
[ENHANCEMENT] Create Evaluation and Generation Scripts
#11
Quentin-Anthony
closed
11 months ago
3
Support for HF tokenizers
#10
yury-tokpanov
closed
1 year ago
0
[ENHANCEMENT] Run evals on our MoE and Dense checkpoints
#9
Quentin-Anthony
closed
11 months ago
2
Merge latest changes from main branch into inf_lr_sched branch
#8
pglorio
closed
1 year ago
0
[ENHANCEMENT] MoE TopK Routing
#7
Quentin-Anthony
closed
11 months ago
2
[ENHANCEMENT] Create our own Dockerfile
#6
Quentin-Anthony
closed
11 months ago
4
[BUG] Fix ZeRO with MoE
#5
Quentin-Anthony
opened
1 year ago
0
[ENHANCEMENT] Test MegaBlocks
#4
Quentin-Anthony
opened
1 year ago
2
[BUG] Dataloader Overflow Errors
#3
Quentin-Anthony
opened
1 year ago
1
Infinite LR Schedules
#2
Quentin-Anthony
closed
1 year ago
0
[ENHANCEMENT] Infinite LR Schedules
#1
Quentin-Anthony
closed
11 months ago
0