issues
search
epfLLM
/
Megatron-LLM
distributed trainer for LLMs
Other
500
stars
72
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Introduce Sailor Models
#105
longxudou
closed
1 month ago
1
Gemma Support
#104
pedrohenriqueamartins
closed
2 months ago
0
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256) #81
#103
yushengsu-thu
opened
3 months ago
2
Error in document (https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#data-preprocessing)
#102
yushengsu-thu
opened
3 months ago
0
Does it support sequence parallel?
#101
NamrataRShivagunde
closed
3 months ago
1
Any plans to rebase the codebase to most recent Megatron-LM for MoE?
#100
xingyaoww
opened
4 months ago
0
Correctness when enabling FlashAttention + Sequence Parallel at the same time?
#99
xingyaoww
closed
4 months ago
2
Multi nodes
#98
wodeqiansuihan
closed
3 months ago
1
update conversion script to support codellama-70b
#97
panx27
opened
5 months ago
0
Support QWen?
#96
Vincent131499
opened
5 months ago
1
How to load from a saved intermediate checkpoint?
#95
jjzha
closed
5 months ago
3
error: preprocess.py file error while working on custom data
#94
toqeer618
opened
5 months ago
0
Replace 1F1B with ZB-H1
#93
QPHutu
opened
5 months ago
4
LLaMA2-70B Inference Optmization
#92
RaymondHQR
closed
5 months ago
1
LLaMa and Mistral 7B pretraining support
#91
StephennFernandes
closed
6 months ago
2
added mistral docs
#90
AleHD
closed
7 months ago
0
One question about the permute function code in permute_qkv.py
#89
drxmy
opened
7 months ago
2
Add Mistral Model
#88
xingyaoww
closed
7 months ago
0
Evalonly and wbresume
#87
AleHD
closed
8 months ago
0
Fix missing position_ids argument when recompute_granularity == full
#86
xingyaoww
opened
8 months ago
0
Typo Fixes in docs/
#85
tmsagarofficial
closed
8 months ago
0
Support specifying load_iters for checkpoint
#84
xingyaoww
closed
8 months ago
2
Use --no_new_tokens to stop adding built-in special tokens
#83
xingyaoww
closed
7 months ago
4
args.make_vocab_size_divisible_by set failed
#82
13416157913
closed
6 months ago
1
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)
#81
13416157913
closed
6 months ago
1
RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED
#80
13416157913
closed
8 months ago
4
finetune llama2-7B when set --seq_length 4096 error
#79
13416157913
closed
8 months ago
1
run finetune llama2-7B error
#78
13416157913
closed
8 months ago
1
run finetune llama2-7B error
#77
13416157913
closed
8 months ago
2
Support for Mistral
#76
philschmid
closed
7 months ago
7
Add eval-only arguments and W&B resume options
#75
eric11eca
closed
8 months ago
4
Update getting_started.md
#74
AleHD
closed
9 months ago
0
RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)
#73
liuxm117
closed
9 months ago
2
Add pointer to the shm-size docker arg to the docs
#72
kylematoba
closed
9 months ago
0
support falcon 180B
#71
martinjaggi
opened
9 months ago
0
Getting started "shard" model not working
#70
philschmid
closed
9 months ago
9
[Save checkpoint needs long time]
#69
mynewstart
closed
9 months ago
2
add support to finetune with use_distributed_optimizer
#68
dumpmemory
closed
8 months ago
11
[Megatron Base Version] Would mind share the based version of Megatron ?
#67
dumpmemory
closed
9 months ago
7
Tokens per second metric
#66
AleHD
closed
9 months ago
0
Feature Request: Can we directly use the huggingface dataset for training
#65
dumpmemory
closed
9 months ago
4
[Swiglu] question about swiglu
#64
mynewstart
closed
9 months ago
6
Loading weights from hf conversion with different TP,PP settings
#63
binwang777
closed
9 months ago
14
Fixed linear time increase observed when micro=1
#62
AleHD
closed
10 months ago
2
From custom hf source
#61
AleHD
closed
10 months ago
0
iteration-time increases linearly when micro_batch_size=1
#60
LlinWing
closed
10 months ago
1
Update hf_to_megatron.py
#59
AleHD
closed
10 months ago
0
Instruct loss scalar
#58
AleHD
closed
10 months ago
1
Better documentation
#57
AleHD
closed
10 months ago
1
Llama v1 import from HF support
#56
AleHD
closed
10 months ago
3
Next