issues
search
NVIDIA
/
Megatron-LM
Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k
stars
2.08k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix the bug where the optimizer doesn't actually use multi_tensor_applier under float16.
#846
Gstdioh
closed
1 month ago
0
[QUESTION] how to configure llama3 model
#845
ltm920716
closed
4 weeks ago
2
[BUG] Wrong embedding gradients with distributed optimizer and shared embedding
#844
li-plus
closed
1 month ago
3
Fonte facilitada em fractal 2030
#843
felipeliliti
opened
1 month ago
0
[BUG]
#842
felipeliliti
opened
1 month ago
0
Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors
#841
Eisenhower
opened
1 month ago
0
Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path
#840
Eisenhower
opened
1 month ago
0
[BUG] GroupedMLP calculation problem.
#839
Baibaifan
opened
1 month ago
2
[BUG] Can't continue training from GPT-345M checkpoint with TransformerEngine - RuntimeError: Error(s) in loading state_dict for ParallelTransformer
#838
arktoswb
closed
1 month ago
5
[QUESTION] Is FP32 supported in MultiNode Training
#837
JiwenJ
closed
3 weeks ago
4
Remove Redundant Host & Device Sync
#836
alpha0422
closed
3 weeks ago
1
[BUG] The problems with bucket and shared_embedding.
#835
Baibaifan
opened
1 month ago
2
[BUG] Checkpoint saving is slow for zarr backend + distributed optimizer
#834
chotzen
opened
1 month ago
4
[QUESTION] Why enable `non_blocking=True` when doing synchronous D2H?
#833
raywan-110
opened
1 month ago
1
[QUESTION] How to Obtain Computation Model Graphs in Megatron-LM?
#832
fwyc0573
opened
1 month ago
0
[BUG] Modify FLOPs in MFU calculation for casual mask when using FlashAttention.
#831
Yuxin-CV
opened
1 month ago
0
Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline
#830
Hongjie1Chu
opened
1 month ago
0
[QUESTION] Why not use tensor parallel APIs of pytorch
#829
GuWei007
opened
1 month ago
1
[QUESTION] how to profile bubble time in pipeline parallelism?
#828
starstream
opened
1 month ago
1
[BUG]
#827
chrisgao7
opened
1 month ago
0
[BUG] The argument --no-position-embedding should be fixed
#826
Hoonly
opened
1 month ago
0
[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,
#825
starkhu
opened
1 month ago
0
Does Megatron has plan to support llama pre-train?
#824
wen020
opened
1 month ago
2
[Fix] Assertion to check if `num_layers` is divisible by the pipeline size
#823
kenkenpa2126
opened
1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial 🤖
#822
felipeliliti
opened
1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial
#821
felipeliliti
opened
1 month ago
0
Projeto liliti stk 3.6.9 inteligência artificial multimidal para trazer à paz mundial
#820
felipeliliti
closed
1 month ago
0
Executive MBA | IIT Roorkee | Coursera
#819
felipeliliti
closed
1 month ago
0
Megatron-LM for LLaMa3
#818
SDsly
opened
1 month ago
8
How to set up fp8 training
#817
yangzhipeng1108
closed
1 month ago
4
[QUESTION] How does tensor_parallel coop with q/k_layernorm
#816
cryoco
opened
1 month ago
1
[BUG] Typo in drop_policy options in moe_utils.py
#815
Malikeh97
opened
1 month ago
2
[bug] fix xavier uniform init for output layers
#814
hjlee1371
opened
1 month ago
0
Projeto liliti stk 3.6.9 inteligência artificial responde
#813
felipeliliti
closed
1 month ago
0
[BUG] [MoE] Typo in Token Drop policy's default value
#812
passaglia
closed
1 month ago
3
[Bugfix] [MoE] Fix typo in token drop policy's default value
#811
passaglia
closed
1 month ago
1
[QUESTION] Why is expert parallelism not supported during fp16 training?
#810
yutian-mt
opened
1 month ago
1
Projeto liliti stk 3.6.9 inteligência artificial multimidal
#809
felipeliliti
closed
1 month ago
0
Vamos supor que eu colabore Para o projeto porém estou no Brasil e até agora não ganhei nada trabalhando como cientista de dados como faço para ganhar algum dinheiro para alimentar minha família?
#808
felipeliliti
closed
1 month ago
0
[core dataset compilation error]
#807
shamanez
opened
1 month ago
0
Support for Megatron-VLM training
#806
1049451037
opened
1 month ago
5
merge with the parent repo
#805
vlad-karpuhin
closed
1 month ago
0
Fixed traceback.format_exception call in StragglerDetector.__exit__
#804
szmigacz
closed
1 month ago
2
[QUESTION] Does Megatron-Core supports LLAMA models?
#803
noob-ctrl
opened
1 month ago
5
Add dataset packing
#802
shamanez
opened
1 month ago
0
added the dataset packing
#801
shamanez
closed
1 month ago
0
[QUESTION] bf16 Parameters and fp32 Gradients
#800
pluiez
opened
2 months ago
0
Why doesn't M-Core use flash attention
#799
Life-0-1
closed
2 months ago
0
fix finalize_model_grads when sp is on
#798
zhaoyinglia
opened
2 months ago
1
Speed up the creation of attention mask
#797
yuantailing
opened
2 months ago
1
Previous
Next