bigcode-project Megatron-LM issues

bigcode-project / Megatron-LM

Ongoing research training transformer models at scale

Other

376 stars 49 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Reduce the tensor-parallel KV output grads

#37 jlamypoirier closed 1 year ago
1
We need to sync gradients with regards to input in MQA

#36 thomasw21 closed 1 year ago
1
Unstable training with MQA and TP>1

#35 RaymondLi0 closed 1 year ago
4
OOM on preprocessing dataset with large number of documents

#34 RaymondLi0 opened 1 year ago
0
Log tflops and other fixes

#33 RaymondLi0 closed 1 year ago
3
add multi-validation for gpt training

#32 RaymondLi0 closed 1 year ago
0
Log GPU throughput

#31 RaymondLi0 closed 1 year ago
0
Meta-information dropout

#30 RaymondLi0 closed 1 year ago
2
Multiple validation datasets

#29 RaymondLi0 closed 1 year ago
3
Lower throughput with UL2 training

#28 RaymondLi0 opened 1 year ago
0
Wandb init error

#27 RaymondLi0 opened 1 year ago
0
Remove hf transformers tools

#26 jlamypoirier closed 1 year ago
0
Create data composition

#25 RaymondLi0 opened 1 year ago
1
Create the Stack 1.2 dataset

#24 RaymondLi0 closed 1 year ago
1
WIP: UL2 merge

#23 RaymondLi0 opened 1 year ago
0
UL2

#22 RaymondLi0 closed 1 year ago
0
Literature review on scaling laws

#21 RaymondLi0 opened 1 year ago
3
Benchmarking Memory Consumption of Optimizers Adam v.s. Adan

#20 SivilTaram opened 1 year ago
0
Support MQA in tools.checkpoint_loader/saver_megatron

#19 RaymondLi0 closed 1 year ago
1
From NVIDIA Megatron-LM for visibility

#18 RaymondLi0 opened 1 year ago
0
Conversion of Huggingface bigcode/santacoder to Nvidia Triton Inference server

#17 michaelfeil opened 1 year ago
0
Experiment plan

#16 RaymondLi0 closed 1 year ago
1
Timeout on creating the index mappings

#15 RaymondLi0 opened 1 year ago
0
Fine tuning and Pre training scripts

#14 shaileshj2803 opened 1 year ago
0
Structured logs and distributed timeout.

#13 jlamypoirier closed 1 year ago
0
Fixes for MQA

#12 jlamypoirier closed 1 year ago
1
Double-check the code for key/value gradient reduction in the case of MQA, when tensor-model-parallel > 1, and for distributed optim

#11 RaymondLi0 closed 1 year ago
4
Preprocess hf

#10 RaymondLi0 closed 1 year ago
0
WIP: Fim

#9 RaymondLi0 closed 2 years ago
0
Train Python model with FIM

#8 harm-devries closed 2 years ago
4
TF-Multi Node Training Layout

#4 harm-devries closed 1 year ago
0
TF-Tokenization

#3 harm-devries closed 1 year ago
0
TF-Model Architecture

#2 harm-devries closed 1 year ago
2
Main

#1 RaymondLi0 closed 2 years ago
0
Train an Encoder on the BigCode Dataset

#5 cakiki closed 1 month ago
0
List of things to be (potentially) ported from Megatron-DeepSpeed

#6 mayank31398 closed 1 year ago
3
Selection of LLMs architecture

#7 maoquan-ms opened 2 years ago
0