Zyphra Megatron-LM issues

Zyphra / Megatron-LM

Ongoing research training transformer models at scale

Other

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Merge with upstream

#48 Quentin-Anthony opened 9 months ago
0
Merge latest main into pyramid moe

#47 Quentin-Anthony closed 11 months ago
0
Print token count during dataset load

#46 yury-tokpanov closed 11 months ago
0
Update time profiling to latest main

#45 Quentin-Anthony closed 11 months ago
0
Update eval_train to latest

#44 Quentin-Anthony closed 11 months ago
0
Varying experts per layer

#43 pglorio opened 11 months ago
0
Improved sinkhorn algorithm

#42 pglorio closed 11 months ago
0
Top1 softmaxt

#41 pglorio closed 11 months ago
0
Time and memory profiling

#40 pglorio opened 11 months ago
0
Marging MLP expansion factor branch

#39 BerenMillidge closed 11 months ago
0
Incorporate evaluation harness into training loop

#38 yury-tokpanov closed 11 months ago
0
[BUG] CUBLAS Error with small number of experts

#37 Quentin-Anthony opened 11 months ago
0
Slurm

#36 pglorio closed 11 months ago
0
Pulled from slurm branch

#35 pglorio closed 11 months ago
0
Merge latest main into slurm

#34 Quentin-Anthony closed 11 months ago
0
Balancing loss2

#33 pglorio closed 11 months ago
0
Remove wandb test log from earlier

#32 Quentin-Anthony closed 11 months ago
0
Text generation

#31 BerenMillidge opened 11 months ago
0
Evaluation

#30 BerenMillidge closed 11 months ago
0
Added possibility of dropping --router-profiling-path

#29 pglorio closed 11 months ago
0
Profiling through NVIDIA Nsight Systems

#28 pglorio closed 11 months ago
0
Update balancing_loss2

#27 pglorio closed 11 months ago
0
update slurm branch with latest main

#26 Quentin-Anthony closed 11 months ago
0
Routing profiling

#25 pglorio closed 11 months ago
0
[ENHANCEMENT] Test mup for moe

#24 Quentin-Anthony opened 11 months ago
0
[ENHANCEMENT] Investigate fp8 Training

#23 Quentin-Anthony opened 11 months ago
0
Fix typo in slurm script

#22 yury-tokpanov closed 11 months ago
0
Routing profiling

#21 pglorio closed 11 months ago
0
[ENHANCEMENT] Introduce Expert Interval

#20 Quentin-Anthony opened 11 months ago
1
Slurm scripts

#19 yury-tokpanov closed 11 months ago
0
[ENHANCEMENT] Introduce routing profiling

#18 Quentin-Anthony closed 11 months ago
0
[ENHANCEMENT] Replace periodic validation with eval harness calls

#17 Quentin-Anthony opened 11 months ago
0
Topk routing (without balancing)

#16 pglorio closed 11 months ago
3
Merge into main the inf_lr_sched2 branch

#15 pglorio closed 11 months ago
2
[ENHANCEMENT] Port new fused rotary embedding kernel into MLM

#14 Quentin-Anthony opened 12 months ago
0
[QUESTION] Look into loading multiple datasets

#13 Quentin-Anthony opened 12 months ago
0
[BUG] Fix the infinite LR schedule

#12 Quentin-Anthony closed 11 months ago
1
[ENHANCEMENT] Create Evaluation and Generation Scripts

#11 Quentin-Anthony closed 11 months ago
3
Support for HF tokenizers

#10 yury-tokpanov closed 1 year ago
0
[ENHANCEMENT] Run evals on our MoE and Dense checkpoints

#9 Quentin-Anthony closed 11 months ago
2
Merge latest changes from main branch into inf_lr_sched branch

#8 pglorio closed 1 year ago
0
[ENHANCEMENT] MoE TopK Routing

#7 Quentin-Anthony closed 11 months ago
2
[ENHANCEMENT] Create our own Dockerfile

#6 Quentin-Anthony closed 11 months ago
4
[BUG] Fix ZeRO with MoE

#5 Quentin-Anthony opened 1 year ago
0
[ENHANCEMENT] Test MegaBlocks

#4 Quentin-Anthony opened 1 year ago
2
[BUG] Dataloader Overflow Errors

#3 Quentin-Anthony opened 1 year ago
1
Infinite LR Schedules

#2 Quentin-Anthony closed 1 year ago
0
[ENHANCEMENT] Infinite LR Schedules

#1 Quentin-Anthony closed 11 months ago
0