issues
search
bigcode-project
/
Megatron-LM
Ongoing research training transformer models at scale
Other
373
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Saved model checkpoint in different precision ?
#88
Ankush2k
opened
6 months ago
1
IndexError: too many indices for tensor of dimension 2
#87
YFeather
opened
10 months ago
0
Without "<suffix>" token
#86
YFeather
closed
10 months ago
1
Updated Megatron version
#85
jlamypoirier
opened
10 months ago
0
Diff with nvidia main
#84
jlamypoirier
opened
10 months ago
0
HumanEval greedy (sequential) generation using a server
#83
loubnabnl
opened
11 months ago
0
Create pretrain_starcoder2_1b.slurm
#82
loubnabnl
opened
12 months ago
0
add file level FIM and sanity check
#81
loubnabnl
closed
12 months ago
0
Want explanation of the MQA related code
#80
hyunwoongko
closed
1 year ago
0
replace repeat_interleave with basic torch functions
#79
mayank31398
closed
1 year ago
0
LM Head FLOPs
#78
Muennighoff
opened
1 year ago
2
How to use my dataset to finetune
#77
zxyscz
opened
1 year ago
0
Fix data preprocessing
#76
mayank31398
closed
1 year ago
0
How to generate several result with num_beams
#75
Eggwardhan
opened
1 year ago
0
Fix train-iters typo & format script
#74
huybery
closed
1 year ago
0
convert reshape to view
#73
mayank31398
closed
1 year ago
0
Support flash attn 2
#72
jlamypoirier
closed
1 year ago
0
Fixed MQA outputs not matching with HF model with non-flash case
#71
mayank31398
closed
1 year ago
5
[WIP] Add training scripts
#70
loubnabnl
closed
1 year ago
0
OOM while merging starcoder model (after sft) from TP=4,PP=4 to TP=8,PP=1
#69
mintsugaEHEH
closed
1 year ago
1
re-merge from NVIDIA main
#68
RaymondLi0
opened
1 year ago
0
fix saving and loading of old checkpoints
#67
mayank31398
closed
1 year ago
2
fix missing world_size in args_to_keep
#66
mayank31398
closed
2 months ago
3
Skip unnecessary compilation
#65
jlamypoirier
closed
1 year ago
0
re-merge from NVIDIA main
#64
RaymondLi0
closed
1 year ago
0
how to convert huggingface?
#63
cdj0311
closed
1 year ago
2
Add Deepspeed integration [WIP]
#62
mayank31398
closed
2 months ago
0
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'
#61
KOVVURISATYANARAYANAREDDY
opened
1 year ago
0
how to prepare the training data to train starcoder?
#60
wwngh1233
opened
1 year ago
2
ValueError: Invalid attention arguments: AttnType.self_attn, None
#59
chen-lee-li
opened
1 year ago
1
merge from Nvidia main
#58
RaymondLi0
closed
1 year ago
3
Incomplete humaneval evaluation code
#57
huybery
opened
1 year ago
0
Incomplete humaneval evaluation code
#56
huybery
closed
1 year ago
0
add ROCm devices support
#55
mayank31398
closed
1 year ago
1
Add tokens-per-second-gpu to the printed logs instead of just wandb
#54
loubnabnl
closed
1 year ago
0
assert Flash Attention doesn't get arbitrary mask
#53
mayank31398
closed
1 year ago
2
Finetune StarCoder Megatron
#52
lvwerra
closed
1 year ago
0
Fix mqa parallelization
#51
thomasw21
opened
1 year ago
0
Script to train starcoder
#49
edward-io
closed
1 year ago
2
Packed MTF mask
#48
Muennighoff
closed
1 year ago
0
Mtf
#47
Muennighoff
opened
1 year ago
0
Mqa+flash attn
#46
joaomonteirof
closed
1 year ago
0
Support interleaved pipeline schedules in checkpoint merging tools
#45
RaymondLi0
opened
1 year ago
0
fix distributed optimizer
#44
lvwerra
closed
1 year ago
0
add token/s/gpu to wandb
#43
lvwerra
closed
1 year ago
0
Create pretrain_bigbigcode.slurm
#42
lvwerra
closed
1 year ago
1
Add flash-attn
#41
RaymondLi0
closed
1 year ago
2
support mqa in checkpoint-merging tools
#40
RaymondLi0
closed
1 year ago
2
Kv grad allreduce v2
#39
jlamypoirier
closed
1 year ago
0
Improve loading of the data-paths
#38
RaymondLi0
opened
1 year ago
0
Next