issues
search
bigcode-project
/
Megatron-LM
Ongoing research training transformer models at scale
Other
376
stars
49
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Reduce the tensor-parallel KV output grads
#37
jlamypoirier
closed
1 year ago
1
We need to sync gradients with regards to input in MQA
#36
thomasw21
closed
1 year ago
1
Unstable training with MQA and TP>1
#35
RaymondLi0
closed
1 year ago
4
OOM on preprocessing dataset with large number of documents
#34
RaymondLi0
opened
1 year ago
0
Log tflops and other fixes
#33
RaymondLi0
closed
1 year ago
3
add multi-validation for gpt training
#32
RaymondLi0
closed
1 year ago
0
Log GPU throughput
#31
RaymondLi0
closed
1 year ago
0
Meta-information dropout
#30
RaymondLi0
closed
1 year ago
2
Multiple validation datasets
#29
RaymondLi0
closed
1 year ago
3
Lower throughput with UL2 training
#28
RaymondLi0
opened
1 year ago
0
Wandb init error
#27
RaymondLi0
opened
1 year ago
0
Remove hf transformers tools
#26
jlamypoirier
closed
1 year ago
0
Create data composition
#25
RaymondLi0
opened
1 year ago
1
Create the Stack 1.2 dataset
#24
RaymondLi0
closed
1 year ago
1
WIP: UL2 merge
#23
RaymondLi0
opened
1 year ago
0
UL2
#22
RaymondLi0
closed
1 year ago
0
Literature review on scaling laws
#21
RaymondLi0
opened
1 year ago
3
Benchmarking Memory Consumption of Optimizers Adam v.s. Adan
#20
SivilTaram
opened
1 year ago
0
Support MQA in tools.checkpoint_loader/saver_megatron
#19
RaymondLi0
closed
1 year ago
1
From NVIDIA Megatron-LM for visibility
#18
RaymondLi0
opened
1 year ago
0
Conversion of Huggingface bigcode/santacoder to Nvidia Triton Inference server
#17
michaelfeil
opened
1 year ago
0
Experiment plan
#16
RaymondLi0
closed
1 year ago
1
Timeout on creating the index mappings
#15
RaymondLi0
opened
1 year ago
0
Fine tuning and Pre training scripts
#14
shaileshj2803
opened
1 year ago
0
Structured logs and distributed timeout.
#13
jlamypoirier
closed
1 year ago
0
Fixes for MQA
#12
jlamypoirier
closed
1 year ago
1
Double-check the code for key/value gradient reduction in the case of MQA, when tensor-model-parallel > 1, and for distributed optim
#11
RaymondLi0
closed
1 year ago
4
Preprocess hf
#10
RaymondLi0
closed
1 year ago
0
WIP: Fim
#9
RaymondLi0
closed
2 years ago
0
Train Python model with FIM
#8
harm-devries
closed
2 years ago
4
TF-Multi Node Training Layout
#4
harm-devries
closed
1 year ago
0
TF-Tokenization
#3
harm-devries
closed
1 year ago
0
TF-Model Architecture
#2
harm-devries
closed
1 year ago
2
Main
#1
RaymondLi0
closed
2 years ago
0
Train an Encoder on the BigCode Dataset
#5
cakiki
closed
1 month ago
0
List of things to be (potentially) ported from Megatron-DeepSpeed
#6
mayank31398
closed
1 year ago
3
Selection of LLMs architecture
#7
maoquan-ms
opened
2 years ago
0
Previous