bigcode-project Megatron-LM issues

bigcode-project / Megatron-LM

Ongoing research training transformer models at scale

Other

374 stars 49 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Saved model checkpoint in different precision ?

#88 Ankush2k opened 6 months ago
1
IndexError: too many indices for tensor of dimension 2

#87 YFeather opened 10 months ago
0
Without "<suffix>" token

#86 YFeather closed 10 months ago
1
Updated Megatron version

#85 jlamypoirier opened 11 months ago
0
Diff with nvidia main

#84 jlamypoirier opened 11 months ago
0
HumanEval greedy (sequential) generation using a server

#83 loubnabnl opened 1 year ago
0
Create pretrain_starcoder2_1b.slurm

#82 loubnabnl opened 1 year ago
0
add file level FIM and sanity check

#81 loubnabnl closed 1 year ago
0
Want explanation of the MQA related code

#80 hyunwoongko closed 1 year ago
0
replace repeat_interleave with basic torch functions

#79 mayank31398 closed 1 year ago
0
LM Head FLOPs

#78 Muennighoff opened 1 year ago
2
How to use my dataset to finetune

#77 zxyscz opened 1 year ago
0
Fix data preprocessing

#76 mayank31398 closed 1 year ago
0
How to generate several result with num_beams

#75 Eggwardhan opened 1 year ago
0
Fix train-iters typo & format script

#74 huybery closed 1 year ago
0
convert reshape to view

#73 mayank31398 closed 1 year ago
0
Support flash attn 2

#72 jlamypoirier closed 1 year ago
0
Fixed MQA outputs not matching with HF model with non-flash case

#71 mayank31398 closed 1 year ago
5
[WIP] Add training scripts

#70 loubnabnl closed 1 year ago
0
OOM while merging starcoder model (after sft) from TP=4,PP=4 to TP=8,PP=1

#69 mintsugaEHEH closed 1 year ago
1
re-merge from NVIDIA main

#68 RaymondLi0 opened 1 year ago
0
fix saving and loading of old checkpoints

#67 mayank31398 closed 1 year ago
2
fix missing world_size in args_to_keep

#66 mayank31398 closed 3 months ago
3
Skip unnecessary compilation

#65 jlamypoirier closed 1 year ago
0
re-merge from NVIDIA main

#64 RaymondLi0 closed 1 year ago
0
how to convert huggingface?

#63 cdj0311 closed 1 year ago
2
Add Deepspeed integration [WIP]

#62 mayank31398 closed 3 months ago
0
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'

#61 KOVVURISATYANARAYANAREDDY opened 1 year ago
0
how to prepare the training data to train starcoder?

#60 wwngh1233 opened 1 year ago
2
ValueError: Invalid attention arguments: AttnType.self_attn, None

#59 chen-lee-li opened 1 year ago
1
merge from Nvidia main

#58 RaymondLi0 closed 1 year ago
3
Incomplete humaneval evaluation code

#57 huybery opened 1 year ago
0
Incomplete humaneval evaluation code

#56 huybery closed 1 year ago
0
add ROCm devices support

#55 mayank31398 closed 1 year ago
1
Add tokens-per-second-gpu to the printed logs instead of just wandb

#54 loubnabnl closed 1 year ago
0
assert Flash Attention doesn't get arbitrary mask

#53 mayank31398 closed 1 year ago
2
Finetune StarCoder Megatron

#52 lvwerra closed 1 year ago
0
Fix mqa parallelization

#51 thomasw21 opened 1 year ago
0
Script to train starcoder

#49 edward-io closed 1 year ago
2
Packed MTF mask

#48 Muennighoff closed 1 year ago
0
Mtf

#47 Muennighoff opened 1 year ago
0
Mqa+flash attn

#46 joaomonteirof closed 1 year ago
0
Support interleaved pipeline schedules in checkpoint merging tools

#45 RaymondLi0 opened 1 year ago
0
fix distributed optimizer

#44 lvwerra closed 1 year ago
0
add token/s/gpu to wandb

#43 lvwerra closed 1 year ago
0
Create pretrain_bigbigcode.slurm

#42 lvwerra closed 1 year ago
1
Add flash-attn

#41 RaymondLi0 closed 1 year ago
2
support mqa in checkpoint-merging tools

#40 RaymondLi0 closed 1 year ago
2
Kv grad allreduce v2

#39 jlamypoirier closed 1 year ago
0
Improve loading of the data-paths

#38 RaymondLi0 opened 1 year ago
0