NVIDIA Megatron-LM issues

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.23k stars 2.08k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Fix the bug where the optimizer doesn't actually use multi_tensor_applier under float16.

#846 Gstdioh closed 1 month ago
0
[QUESTION] how to configure llama3 model

#845 ltm920716 closed 4 weeks ago
2
[BUG] Wrong embedding gradients with distributed optimizer and shared embedding

#844 li-plus closed 1 month ago
3
Fonte facilitada em fractal 2030

#843 felipeliliti opened 1 month ago
0
[BUG]

#842 felipeliliti opened 1 month ago
0
Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors

#841 Eisenhower opened 1 month ago
0
Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path

#840 Eisenhower opened 1 month ago
0
[BUG] GroupedMLP calculation problem.

#839 Baibaifan opened 1 month ago
2
[BUG] Can't continue training from GPT-345M checkpoint with TransformerEngine - RuntimeError: Error(s) in loading state_dict for ParallelTransformer

#838 arktoswb closed 1 month ago
5
[QUESTION] Is FP32 supported in MultiNode Training

#837 JiwenJ closed 3 weeks ago
4
Remove Redundant Host & Device Sync

#836 alpha0422 closed 3 weeks ago
1
[BUG] The problems with bucket and shared_embedding.

#835 Baibaifan opened 1 month ago
2
[BUG] Checkpoint saving is slow for zarr backend + distributed optimizer

#834 chotzen opened 1 month ago
4
[QUESTION] Why enable `non_blocking=True` when doing synchronous D2H?

#833 raywan-110 opened 1 month ago
1
[QUESTION] How to Obtain Computation Model Graphs in Megatron-LM?

#832 fwyc0573 opened 1 month ago
0
[BUG] Modify FLOPs in MFU calculation for casual mask when using FlashAttention.

#831 Yuxin-CV opened 1 month ago
0
Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline

#830 Hongjie1Chu opened 1 month ago
0
[QUESTION] Why not use tensor parallel APIs of pytorch

#829 GuWei007 opened 1 month ago
1
[QUESTION] how to profile bubble time in pipeline parallelism?

#828 starstream opened 1 month ago
1
[BUG]

#827 chrisgao7 opened 1 month ago
0
[BUG] The argument --no-position-embedding should be fixed

#826 Hoonly opened 1 month ago
0
[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,

#825 starkhu opened 1 month ago
0
Does Megatron has plan to support llama pre-train？

#824 wen020 opened 1 month ago
2
[Fix] Assertion to check if `num_layers` is divisible by the pipeline size

#823 kenkenpa2126 opened 1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial 🤖

#822 felipeliliti opened 1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial

#821 felipeliliti opened 1 month ago
0
Projeto liliti stk 3.6.9 inteligência artificial multimidal para trazer à paz mundial

#820 felipeliliti closed 1 month ago
0
Executive MBA | IIT Roorkee | Coursera

#819 felipeliliti closed 1 month ago
0
Megatron-LM for LLaMa3

#818 SDsly opened 1 month ago
8
How to set up fp8 training

#817 yangzhipeng1108 closed 1 month ago
4
[QUESTION] How does tensor_parallel coop with q/k_layernorm

#816 cryoco opened 1 month ago
1
[BUG] Typo in drop_policy options in moe_utils.py

#815 Malikeh97 opened 1 month ago
2
[bug] fix xavier uniform init for output layers

#814 hjlee1371 opened 1 month ago
0
Projeto liliti stk 3.6.9 inteligência artificial responde

#813 felipeliliti closed 1 month ago
0
[BUG] [MoE] Typo in Token Drop policy's default value

#812 passaglia closed 1 month ago
3
[Bugfix] [MoE] Fix typo in token drop policy's default value

#811 passaglia closed 1 month ago
1
[QUESTION] Why is expert parallelism not supported during fp16 training?

#810 yutian-mt opened 1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial multimidal

#809 felipeliliti closed 1 month ago
0
Vamos supor que eu colabore Para o projeto porém estou no Brasil e até agora não ganhei nada trabalhando como cientista de dados como faço para ganhar algum dinheiro para alimentar minha família?

#808 felipeliliti closed 1 month ago
0
[core dataset compilation error]

#807 shamanez opened 1 month ago
0
Support for Megatron-VLM training

#806 1049451037 opened 1 month ago
5
merge with the parent repo

#805 vlad-karpuhin closed 1 month ago
0
Fixed traceback.format_exception call in StragglerDetector.__exit__

#804 szmigacz closed 1 month ago
2
[QUESTION] Does Megatron-Core supports LLAMA models?

#803 noob-ctrl opened 1 month ago
5
Add dataset packing

#802 shamanez opened 1 month ago
0
added the dataset packing

#801 shamanez closed 1 month ago
0
[QUESTION] bf16 Parameters and fp32 Gradients

#800 pluiez opened 2 months ago
0
Why doesn't M-Core use flash attention

#799 Life-0-1 closed 2 months ago
0
fix finalize_model_grads when sp is on

#798 zhaoyinglia opened 2 months ago
1
Speed up the creation of attention mask

#797 yuantailing opened 2 months ago
1

Previous Next