issues
search
NVIDIA
/
Megatron-LM
Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.2k
stars
2.07k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[QUESTION] Has standalone_embedding_stage been supported yet in core?
#890
JiwenJ
opened
10 hours ago
0
[QUESTION] Why does the tokenizer of mamba-2-hybrid have two ids for the token 'Yes'? id 24639 and id 7298
#889
Mooler0410
opened
19 hours ago
1
modify typo in megatron/core/models/bert/bert_model.py
#888
wplf
opened
2 days ago
0
modify typo in bert_model.py
#887
wplf
opened
2 days ago
0
Rename the correct variable of seed
#886
FancyXun
opened
3 days ago
0
OPTIM get_batch traffic when enable context-parallel
#885
Superkeyv
opened
4 days ago
0
[QUESTION] Why is TELayerNormColumnParallelLinear used instead of TEColumnParallelLinear in gpt_layer_specs
#884
clarence-lee-sheng
opened
5 days ago
1
[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16"
#883
dong-liuliu
opened
5 days ago
0
[BUG] Wrong lr multiplier
#882
artyomtugaryov
opened
5 days ago
0
[BUGS] Pipeline Parallelism fails/hangs with Megatron Core example
#881
schheda1
opened
6 days ago
0
[BUG] @jit_fuser fails with Unknown type constructor Sequence
#880
Edenzzzz
opened
6 days ago
3
[QUESTION] --overlap-grad-allreduce failing as gradients coming through as None in param hook
#879
jambo6
closed
1 week ago
2
[QUESTION] OSError: [Errno 28] No space left on device
#878
zhaoyz1017
closed
6 days ago
4
[QUESTION] Gloo connectFullMesh failed when the number of nodes setting "export GLOO_SOCKET_IFNAME=bond4" exceeds 60
#877
Genlovy-Hoo
opened
1 week ago
0
[QUESTION]when pretraining bert,meet bug:cuBLAS Error: the requested functionality is not supported
#876
shanyuaa
opened
1 week ago
0
[BUG] RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
#875
janelu9
opened
1 week ago
1
draft: Bert context parallelism support
#874
JimmyZhang12
closed
1 week ago
0
[BUG] pipeline_paralle is not available when pp_size > 2
#873
qby10
opened
1 week ago
0
[BUG] Rank worldsize mismatch prevents tensorboard from being set
#872
zainsarwar865
closed
4 days ago
0
[QUESTION] How to time the code
#871
Weifan1226
opened
1 week ago
0
[BUG] megatron.training not found
#870
windprak
closed
5 days ago
3
[QUESTION] Question about Mixtral compatibility with Megatron-LM core0.7.0
#869
wavy-jung
closed
1 week ago
0
[QUESTION] Using segformer segmentation models
#868
cporrasn
opened
1 week ago
0
[ENHANCEMENT]Can we pass a tuple that includs all the tensors I'd like to pass between pipeline's different stages?
#867
janelu9
opened
1 week ago
0
[BUG] the argument of parser.add_argument is wrong in tools/checkpoint/convert.py
#866
adoda
opened
1 week ago
0
[QUESTION] why the _p2p_ops functions has the condition branches for get_pipeline_model_parallel_rank()
#865
lichenlu
opened
1 week ago
0
[QUESTION]Mamba-2-hybrid Weights
#864
Mooler0410
closed
1 week ago
4
Fix(memory optimization): inplace subtract vocab_parallel_logits
#863
Andy666G
opened
1 week ago
0
[QUESTION] Loss increased by 10 times at second step (after one step of backward).
#862
janelu9
closed
1 week ago
17
[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage?
#861
janelu9
opened
2 weeks ago
0
When H800 is trained with FP8, the performance is not significantly improved compared to FP16, and is even worse than FP16.
#860
yangzhipeng1108
closed
5 days ago
3
Projeto liliti stk 3.6.9 acabou
#859
felipeliliti
opened
2 weeks ago
0
[BUG] Mismatch Between Docstring and Behavior in core.tensor_parallel.random.model_parallel_cuda_manual_seed
#858
cong-bai
opened
3 weeks ago
0
[ENHANCEMENT]How to specify the number of layers in each pipeline stage in my mind?
#857
janelu9
closed
2 weeks ago
0
[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?
#856
liangshaopeng
opened
3 weeks ago
0
[BUG] Megatron Core example not working
#855
schheda1
opened
3 weeks ago
3
[QUESTION] Problems performing inference
#854
srivassid
closed
3 weeks ago
1
[ENHANCEMENT] update black version
#853
hwdef
opened
3 weeks ago
1
[BUG] Preprocess_data.py does not finalize all keys
#852
zainsarwar865
closed
3 weeks ago
0
[QUESTION] Question about resume with distributed optimizer
#851
WailordHe
opened
3 weeks ago
1
[QUESTION] Why TE is not used for an MoE layer?
#850
Btlmd
closed
2 weeks ago
2
[QUESTION] Does Megatron-LM supports P100?
#849
gaokaiz2
opened
3 weeks ago
1
[BUG] wrong scale softmax for local transformer implement
#848
Superkeyv
opened
3 weeks ago
0
Fix the bug where the optimizer doesn't actually call multi_tensor_applier under float16.
#847
Gstdioh
opened
3 weeks ago
0
Fix the bug where the optimizer doesn't actually use multi_tensor_applier under float16.
#846
Gstdioh
closed
3 weeks ago
0
[QUESTION] how to configure llama3 model
#845
ltm920716
closed
3 weeks ago
2
[BUG] Wrong embedding gradients with distributed optimizer and shared embedding
#844
li-plus
closed
4 weeks ago
3
Fonte facilitada em fractal 2030
#843
felipeliliti
opened
4 weeks ago
0
[BUG]
#842
felipeliliti
opened
4 weeks ago
0
Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors
#841
Eisenhower
opened
1 month ago
0
Next